AES New York 2011
Paper Session P13
P13 - Low Bit-Rate Coding—Part 1
Friday, October 21, 2:00 pm — 4:30 pm (Room: 1E07)
Chair:
Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany
P13-1 Performance of MPEG Unified Speech and Audio Coding—Schuyler Quackenbush, Audio Research Labs - Scotch Plains, NJ, USA; Roch Lefebvre, Université de Sherbrooke - Sherbrooke, Quebec, Canada
The MPEG Unified Speech and Audio Coding (USAC) standard completed technical development in July, 2011 and is expected to issue as international standard ISO/IEC 23003-3, Unified Speech and Audio Coding in late 2011. Verification tests were conducted to assess the subjective quality of this new specification. Test material consisted of 24 items of mixed speech and music content and performance was assessed via subjective listening tests at coding rates ranging from 8 kb/s for mono material to 96 kb/s for stereo material. The mean performance of USAC was found to be better than that of MPEG High Efficiency AAC V2 (HE-AAC V2) and Adaptive Multi-Rate Wide Band Plus (AMR-WB+) at all tested operating points. For most bit rates tested this performance advantage is large. Furthermore, USAC provides a much more consistent quality across all signal content types than the other systems tested. This paper summarizes the results of the verification tests.
Convention Paper 8514 (Purchase now)
P13-2 Improved Error Robustness for Predictive Ultra Low Delay Audio Coding—Michael Schnabel, Ilmenau University of Technology - Ilmenau, Germany; Michael Werner, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Gerald Schuller, Ilmenau University of Technology - Ilmenau, Germany
This paper proposes a new method for improving the audio quality of predictive perceptual audio coding in the context of the Ultra Low Delay (ULD) coding scheme for real time applications. The commonly used auto-regressive (AR) signal model is leading to an IIR predictor in the decoder. For random access of the transmission as well as for transmission errors, a reset of the predictor states, in both encoder and decoder is used. The resets reduce the prediction performance and thus the SNR, especially during stationary signal parts, since the resulting noise peaks could become audible. This paper shows that using adaptive reset intervals, that are chosen according to psychoacoustic rules, improves the audio quality.
Convention Paper 8515 (Purchase now)
P13-3 AAC-ELD v2—The New State of the Art in High Quality Communication Audio Coding—Manfred Lutzky, María Luis Valero, Markus Schnell, Johannes Hilpert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Recently MPEG finished the standardization of a Low Delay MPEG Surround tool that is tailored for enhancing the widely adopted AAC-ELD low delay codec for high-quality audio communication into AAC-ELD v2. In combination with the Low Delay MPEG Surround tool, the coding efficiency for stereo content outperforms competing low delay audio codecs at least by a factor of 2. This paper describes the technical challenges and solutions for designing a low delay codec that delivers a performance that is comparable to that of existing state of the art compression schemes featuring much higher delay, such as HE AAC v2. It provides a comparison to competing proprietary and ITU-T codecs, as well as a guideline for how to select the best possible points of operation. Applications facilitated by AAC-ELD v2 in the area of broadcasting and mobile video conferencing are discussed.
Convention Paper 8516 (Purchase now)
P13-4 QMF-Based Harmonic Spectral Band Replication—Haishan Zhong, Panasonic Corporation; Lars Villemoes, Per Ekstrand, Dolby Sweden AB; Sascha Disch, Frederik Nagel, Stephan Wilde, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Kok Seng Chong, Takeshi Norimatsu, Panasonic Corporation
Unified speech and audio coding (USAC) is the next step in the evolution of audio codecs that are standardized by the Moving Picture Experts Group (MPEG). USAC provides consistent quality for music, speech and mixed material, by extending the strength of an audio codec by speech codec functions. Two novel flavors of Spectral Band Replication (SBR) were introduced to enhance the perceptual quality of SBR: the Discrete Fourier Transform (DFT) based and the Quadrature Mirror Filterbank (QMF) harmonic SBR. The DFT-based SBR has higher frequency resolution for the harmonic transposition process, resulting in good sound quality. The QMF-based SBR has significantly lower computational complexity. This paper describes the detailed technical aspects of the low complexity QMF-based harmonic SBR tool within USAC. A complexity comparison and the listening test results are also presented in the paper.
Convention Paper 8517 (Purchase now)
P13-5 Perceptually Optimized Cascaded Long Term Prediction of Polyphonic Signals for Enhanced MPEG-AAC—Tejaswi Nanjundaswamy, Kenneth Rose, University of California Santa Barbara - Santa Barbara, CA, USA
MPEG-4 Advanced Audio Coding uses the long term prediction (LTP) tool to exploit inter-frame correlations by providing a segment of previously reconstructed samples as prediction for the current frame, which is naturally useful for encoding signals with a single periodic component. However, most audio signals are polyphonic in nature containing a mixture of several periodic components. While such polyphonic signals are themselves periodic with overall period equaling the least common multiple of the individual component periods, the signal rarely remains sufficiently stationary over the extended period, rendering the LTP tool ineffective. Further hindering the LTP tool is the typically employed parameter selection based on minimizing the mean squared error as opposed to the perceptual distortion criteria defined for audio coding. We thus propose a technique to exploit the correlation of each periodic component with its immediate past, while taking into account the perceptual distortion criteria. Specifically, we propose cascading LTP filters corresponding to individual periodic components, designed appropriately in a two stage method, wherein an initial set of parameters is estimated backward adaptively to minimize the mean squared prediction error, followed by a refinement stage where parameters are adjusted to minimize the perceptual distortion. Objective and subjective results validate the effectiveness of the proposal on a variety of polyphonic signals.
Convention Paper 8518 (Purchase now)
Information Last Updated: 20111005, mei