AES London 2011
Poster Session P15

P15 - Speech and Coding


Sunday, May 15, 11:30 — 13:00 (Room: Foyer)

P15-1 A New Robust Hybrid Acoustic Echo Cancellation Algorithm for Speech Communication in Car EnvironmentsYi Zhou, Chengshi Zheng, Xiaodong Li, Chinese Academy of Sciences - Beijing, China
This paper studies a new robust hybrid adaptive filtering algorithm for acoustic echo cancellation (AEC) in a car speech communication system. The proposed algorithm integrates the affine projection algorithm (APA) and normalized least mean square (NLMS) algorithm. It can switch between them with a coefficient derivative-based technique to achieve overall optimum convergence performance. To help the algorithm combat deteriorating impulsive interferences widely encountered in car environments and enhance the system robustness in double talk period, robust statistics technique is also incorporated into the proposed algorithm. Experiments are conducted to verify the improved and robust overall AEC convergence performance achieved by the new algorithm.
Convention Paper 8407 (Purchase now)

P15-2 Speech Source Separation Using a Multi-Pitch Harmonic Product Spectrum-Based AlgorithmRehan Ahmed, Roberto Gil-Pita, David Ayllón, Lorena Álvarez, University of Alcalá - Alcalá de Henares, Spain
This paper presents an efficient algorithm for separating speech signals by determining multiple pitches from mixtures of signals and assigning the sources to one of those estimated pitches. The pitch detection algorithm is based on Harmonic Product Spectrum. Since the pitch of speech signals fluctuates readily, a frame-based algorithm is used to extract the multiple pitches in each frame. Then, the fundamental frequency (pitch) for each source is estimated and tracked after comparing all the frames. The estimated fundamental frequency of the sources is then used to generate a set of binary masks that allow separating the signals in the Short Time Fourier Transform domain. Results show a considerable separation of the speech signals, justifying the feasibility of the proposed method.
Convention Paper 8408 (Purchase now)

P15-3 User-Orientated Subjective Quality Assessment in the Case of Simultaneous InterpretersJudith Liebetrau, Thomas Sporer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany; Paolo Tosoratti, DG Interpretation - Brussels, Belgium; Daniel Fröhlich, Sebastian Schneider, Jens-Oliver Fischer, Fraunhofer Institute for Digital Media Technology IDMT - Ilmenau, Germany
A study was conducted to explore various subjective quality aspects of audio-visual systems for interpreters. The study was designed to bring the existing requirements for on-site simultaneous interpretation up-to-date and to specify new requirements for remote interpretation (teleconferencing). The feasibility of using objective measurement methods in this context was examined. Several parameters influencing perceived quality, such as audio coding, video coding, room characteristics, and audio-visual latency were assessed. The results obtained are partially contradictory to previous studies. This leads to the conclusion that perceived quality is strongly linked to the focus, background, and abilities of the assessors. The test design, realization, and obtained results are shown, as well as a comparison to studies conducted with different types of users.
Convention Paper 8409 (Purchase now)

P15-4 Analysis of Parameters of the Speech Signal Loudness of Professional Television AnnouncersMirjana Mihajlovic, Radio Television of Serbia - Belgrade, Serbia; Dejan Todorovic, Radio Belgrade - Belgrade, Serbia; Iva Salom, Institute Mihajlo Pupin - Belgrade, Serbia
The paper analyzes speech loudness of professional announcers, who do large percentage of television audio signals. Over 100 files of speech signals recorded with same equipment were analyzed. For each file the authors calculated loudness and true peak, according to ITU-R BS 1770 and EBU R128 and RMS. There were criteria adopted for classification of records, and results were statistically analyzed. Analysis showed that the loudness of the speech signal depends on the gender of speakers, intonation, and type of program material, and also that files of greater loudness may not always have higher peak value to files of lower loudness. It was noted there was a significant overlap among the short-term loudness and RMS values calculated with the same time constant.
Convention Paper 8410 (Purchase now)

P15-5 Improved Prediction of Nonstationary Frames for Lossless Audio CompressionFlorin Ghido, Tampere University of Technology - Tampere, Finland
We present a new algorithm for improved prediction of nonstationary frames for asymmetrical lossless audio compression. Linear prediction is very efficient for decorrelation of audio samples, however it requires segmentation of the audio into quasi-stationary frames. Adaptive segmentation tries to minimize the total compressed size, including the quantized prediction coefficients for each frame, thus longer frames that are not quite stationary may be selected. The new algorithm for computing the linear prediction coefficients improves compressibility of nonstationary frames when compared with the least squares method. With adaptive segmentation, the proposed algorithm leads to small but consistent compression improvements up to 0.56%, on average 0.11%. For faster encoding using fixed size frames, without including adaptive segmentation, it significantly reduces the penalty on compression with more than 0.21% on average.
Convention Paper 8411 (Purchase now)

P15-6 A Lossless/Near-Lossless Audio Codec for Low Latency Streaming Applications on Embedded DevicesNeil Smyth, Cambridge Silicon Radio - Belfast, Northern Ireland
Increasingly there is a need for high quality audio streaming in devices such as modular home audio networking systems, PMPs, and wireless loudspeakers. As wireless transmission capacities continue to increase it is desirable to utilize this increasing data bandwidth to perform real-time wireless streaming of audio content coded in a lossless or near-lossless format. In this paper we discuss the development of an adaptive audio coding algorithm that balances the design goals of low latency, low complexity, error robustness, and a dynamically-variable bit rate that scales to mathematically-lossless coding under suitable conditions. Particular emphasis is placed on the optimization of the algorithm structure for real-time audio processing applications and the mechanism by which "hybrid" lossless and near-lossless coding is achieved.
Convention Paper 8412 (Purchase now)

P15-7 Low-Delay Directional Audio Coding for Real-Time Human-Computer InteractionTapani Pihlajamäki, Ville Pulkki, Aalto University School of Electrical Engineering - Espoo, Finland
Games and virtual worlds require low-cost, good-quality, and low-delay audio engines. The short-time Fourier transform- (STFT) based implementation of Directional Audio Coding (DirAC) for virtual worlds fulfills the two first demands. Unfortunately, the delay can be perceivably large. In this paper, a modification to DirAC is introduced that uses different time-frequency resolutions for STFT concurrently, which are not aligned in time in reproduction. This leads to the high-frequency non-diffuse content being reproduced as soon as possible and thus reduces the perceived delay. Informal tests show that the delay was short enough for a musician to play an instrument through the processing. Moreover, the results of formal listening tests show that the reduction in quality is perceptible but not annoying.
Convention Paper 8413 (Purchase now)


Return to Paper Sessions