AES San Francisco 2010
Paper Session P2
P2 - Speech Processing
Thursday, November 4, 9:30 am — 12:30 pm (Room 236)
Chair:
John Strawn
P2-1 Language Scrambling for In-Game Voice-Chat Applications—Nicolas Tsingos, Charles Robinson, Dolby Laboratories - San Francisco, CA, USA
We propose a solution to enable speech-driven alien language synthesis in an in-game voice-chat system. Our technique selectively replaces the users’ input speech with a corresponding alien language output synthesized on-the-fly. It is optimized for a client-server architecture and uses a concatenative synthesis framework. To limit memory requirements, as well as preserve forwarding capabilities on the server, the concatenative synthesis is performed in the coded domain. For gaming applications, our approach can be used to selectively scramble the speech of opposing team members in order to provide compelling in-game voice feedback without exposing their strategy. The system has been implemented with multiple alien races in a virtual environment with effective, entertaining results.
Convention Paper 8161 (Purchase now)
P2-2 Speech Referenced Limiting: Controlling the Loudness of a Signal with Reference to its Speech Loudness—Michael Fisher, Nicky Chong-White, The HEARing CRC - Melbourne, Victoria, Australia, National Acoustic Laboratories, Sydney, NSW, Australia; Harvey Dillon, National Acoustic Laboratories - Sydney, NSW, Australia
A novel method of sound amplitude limiting for signals conveying speech is presented. The method uses the frequency-specific levels of the speech conveyed by the signal to generate a set of time-varying speech reference levels. It limits the level of sounds conveyed by the signal to these speech reference levels. The method is called speech referenced limiting (SRL). It provides minimal limiting of speech while providing greater control over the loudness of non-speech sounds compared to conventional (fixed threshold) limiters. It is appropriate for use in applications where speech is the primary signal of interest such as telephones, computers, amplified hearing protectors, and hearing aids. The effect of SRL on speech and non-speech sounds is presented.
Convention Paper 8162 (Purchase now)
P2-3 Individually Adjustable Signal Processing Algorithms for Improved Speech Intelligibility with Cochlear Implants—Isabell Kiral-Kornek, Bernd Edler, Jörn Ostermann, Leibniz Universität Hannover - Hannover Germany; Andreas Büchner,, Hörzentrum Hannover, Hannover Medical School - Hannover, Germany
Thanks to the development of cochlear implants (CIs) the treatment of certain hearing impairments and even deafness has become possible. However, up to now individual adjustments of the speech processing within a cochlear implant are limited to an unequal amplification of different frequency bands. As the perception of speech among patients differs beyond just loudness, improvements concerning individually adjustable signal processing are to be made. A novel approach is being presented that aims to increase the intelligibility for patients by extending common speech recognition tests to allow for an optimized speech processing tailored to the patients’ needs.
Convention Paper 8163 (Purchase now)
P2-4 Objective Evaluation of Wideband Speech Codecs for Bluetooth Voice Communication—Gary Spittle, CSR - Cambridge Silicon Radio - Cambridge, UK; Jacek Spiewla, Walter Kargus, Walter Zuluaga, Xuejing Sun, CSR - Cambridge Silicon Radio - Detroit, MI, USA
Bluetooth devices that stream audio are becoming increasingly popular. The user expectation has increased and as a result the requirements on wireless audio devices using Bluetooth has produced a number of challenges. This paper discusses the impact on the wideband speech due to various forms of adverse connection conditions. These are measured in terms of speech quality and intelligibility. A detailed understanding of various forms of degradation allows proper solutions to be provided.
Convention Paper 8164 (Purchase now)
P2-5 Speech Synthesis Controlled by Eye Gazing—Andrzej Czyzewski, Kuba Lopatka, Bartosz Kunka, Rafal Rybacki, Bozena Kostek, Gdansk University of Technology - Gdansk, Poland
A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as "talking by eyes" providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot talk and move any part of their body. The paper describes a methodology of determining the fixation point on a computer screen. Then it presents an algorithm of concatenative speech synthesis used in the solution engineered. An analysis of working with the system is provided. Conclusions focusing on system characteristics are included.
Convention Paper 8165 (Purchase now)
P2-6 Voice Samples Recording and Speech Quality Assessment for Forensic and Automatic Speaker Identification—Andrey Barinov, Speech Technology Center Ltd. - Saint Petersburg, Russia
The task of speaker recognition or speaker identification becomes very important in our digital world. Most of the law enforcement organizations use either automatic or manual speaker identification tools for investigation processes. In any case, before carrying out the identification analysis, they usually need to record a voice sample from the suspect either for one to one comparison or to fill in the database. In this paper we describe the parameters of speech signal that are important for speaker identification performance, we propose the approaches of quality assessment, and provide the practical recommendations of taking the high quality voice sample, acceptable for speaker identification. The materials of this paper might be useful for both soft/hardware developers and forensic practitioners.
Convention Paper 8166 (Purchase now)