AES 110th Convention: Session C

Other AES Events

Chairman's Welcome

General Information

Exhibitors

Calendar in Excel

Calendar in PDF

Paper Sessions

Workshops

Special Events

Historical Program

Student Program

Technical Tours

Cultural Tours

Standards Comm Mtgs

Technical Comm Mtgs

Registration

Session C Saturday, May 12 13:30 - 18:00 hr Room B

Low-Bit Rate Audio Coding

Chair: Marina Bosi, MPEG LA, Denver, CO, USA

13:30 hr C-1
Processor-Efficient Implementation of a High Quality MPEG-2 AAC Encoder
Toshiyuki Nomura & Yuichiro Takamizawa
NEC Corporation, Kawasaki, Japan

This paper describes processor-efficient implementation of a high quality MPEG-2 AAC encoder employing fast psycho-acoustic analysis, efficient encoding of side information, and SIMD instructions. A psycho-acoustic analysis in the MDCT domain reduces computational costs. Smoothing of scale factors and optimized selection of Huffman tables are introduced to efficiently encode the side information. SIMD instructions are heavily used in MDCT and quantization processes to improve the encoding speed. Seven-grade comparison MOS test results show that the AAC encoder at 96 kbps/stereo achieves sound quality equivalent to that of MP3 at 128 kbps/stereo. The encoder works 13 times faster than real-time for stereo encoding on an 800 MHz Pentium III processor.
Paper 5294

14:00 hr C-2
A Multichannel Audio Coding Algorithm for Inter-Channel Redundancy Removal
Ye Wang (1), Leonid Yaroslavsky (2), Miikka Vilermo (1) & Mauri Vaananen (1)
Nokia Research Center, Tampere, Finland
Tel Aviv University, Ramat Aviv, Israel

This paper presents a novel lossless multichannel audio coding algorithm to remove inter-channel redundancy. We employ an Integer-to-Integer Discrete Cosine Transform (INT-DCT) to perform inter-channel decorrelation after quantization of Modified Discrete Cosine Transform (MDCT) coefficients of individual channels. When compared with a Karhunen-Loeve Transform (KLT) based approach our new method has three major advantages: 1) avoids quantization noise spreading to other channels; 2) computational simplicity; 3) uses less overhead information (a quantized covariance matrix or eigenvector is avoided in our algorithm), while having a similar decorrelation capability.
Paper 5295

14:30 hr C-3
Compander Domain Approach to Scalable AAC
Ashish Aggarwal, Kenneth Rose & Shankar Regunathan
University of California, Santa Barbara, CA, USA

We propose a new approach to achieve efficient scalability in audio coders, and demonstrate its performance using the MPEG-4 Advanced Audio Coder (AAC). In conventional scalable coding, the enhancement-layer performs straightforward re-quantization of the base-layer reconstruction error. This coding scheme implicitly discards useful information from the base-layer, and does not truly minimize a perceptually meaningful distortion criterion such as the noise-mask ratio. We reformulate the problem of scalable coding within a companding framework, and show that re-quantization in the compander's compressed domain achieves, in the asymptotic sense, optimal scalability. Based on this observation, we develop a scalable AAC coder which performs enhancement-layer quantization while exploiting all the information available at that layer. Simulation results of a two-layer scalable coder on the standard test database of 44.1kHz sampled audio show that the proposed approach yields substantial savings in bit rate for a given reproduction quality.
Paper 5296

15:00 hr C-4
An Embedding Codec for Multiple Generations Compression Based on
MPEG-1 Layer III
Frank Kurth & Andreas Ribbrock
University of Bonn, Bonn, Germany

We propose an MPEG-1 Layer III conforming audio codec for multiple generations (cascaded) compression without loss of perceptual quality. Previous research addressing this topic mainly focused on less complex coding schemes like MPEG-1 Layer II. In our paper, the techniques proposed in those approaches are extended to comply with Layer III's advanced features such as hybrid filtering, block switching, and bit reservoir based processing. A prototypic implementation including extensive listening tests shows the feasibility of perceptually stable cascaded Layer III compression.
Paper 5297

15:30 hr C-5
Audio Coding with a Masking Threshold Adapted Wavelet Packet based on Run-Time Reconfigurable Processor Architecture
Alexey Petrovsky, Belarusian State University, Minsk, Belarus
Alexander Petrovsky, Technical University of Bialystok, Bialystok, Poland

Adaptive WP tree derived via dynamic algorithm transforms (DAT's) is presented. The DAT is to define parameter of input audio signals (sub band entropy) and output coded sequences (sub band rate) for the given embedded system architecture. A DAT-based pipe-line processor (WP trees analysis (encoder) and synthesis (decoder) algorithms) based on a reconfigurable hardware (such as SRAM-FPGA plus distributed arithmetic) is described.
Paper 5298

16:00 hr C-6
Low Delay Audio Coding Incorporating Psycho-Acoustic Information
Manuel Zurera, Francisco Lopez-Ferreras, Damian Martinez Mu�oz & Fernando Cruz Roldan
Universidad de Alcala, Madrid, Spain

This paper deals with the design and implementation of a scheme for CD quality audio coding that introduces a delay as low as 6 ms., and provides near transparent coding. It is implemented with a uniform filter bank that decomposes the audio signal in thirty-two bands. For such a delay, the filters in the filter bank must have a short impulse response and so the filters non-ideal frequency responses should be taken into account. Near transparent coding is achieved at 96kbps, that is a very good result for such a low delay.
Paper 5299

16:30 hr C-7
Error Protection and Concealment for HILN MPEG-4 Parametric Audio Coding
Heiko Purnhagen, Bernd Edler & Nikolaus Meine
University of Hannover, Hannover, Germany

The HILN (Harmonic and Individual Lines plus Noise) MPEG-4 parametric audio coding tool allows efficient representation of general audio signals at very low bit rates. Therefore possible applications include transmission over IP or wireless channels which are both characterized by specific transmission error models. On the other hand, since parametric audio coding is a relatively new technique compared to transform coding and CELP speech coding, there have been only very limited investigations on HILN's behavior in error prone environments. In this paper we present an analysis of error sensitivities and approaches to error protection and concealment.
Paper 5300

17:00 hr C-8
Avoiding Overlapping in a Time-Varying Wavelet-Packet Based Audio Coder
Manuel Zurera (1), Nicolas Ruiz-Reyes (2), Pedro Vera-Candeas (2) & Francisco Lopez-Ferreras (1)
Universidad de Alcala, Madrid, Spain
Universidad de Jaen, Linares (Jaen), Spain

A new algorithm for avoiding overlapping of adjacent frames in order to reduce the block effect is presented. The algorithm is based on forward and backward prediction at the border of frames and has been applied with success to an audio coder based on time-varying wavelet-packet decompositions that uses symmetrical and periodic extension as method for processing frames in isolation.
Paper 5301

17:30 hr C-9
Realtime Implementations of MPEG-2 and MPEG-4 Natural Audio Coders
Alberto Duenas (1), Bego�a Rivas (1), Antonio Pena (2), Rafael Perez (1) & Enrique Alexandre (2)
Prodys, Madrid, Spain
Universidade de Vigo, Vigo, Spain

MPEG natural audio coders, as MPEG-1/2 Layer III and MPEG-2/4 AAC, require a great amount of calculation, mainly due to the iterative bit allocation processes proposed by the ISO/IEC technical documentation. This complexity makes difficult a real-time implementation of the normative algorithms. To solve this difficulty, this paper discusses a set of non optimal solutions to reduce the computing load and, based on these solutions, a real-time implementation of a MPEG-1/2 Layer III using a single fixed-point DSP is presented. In addition, techniques to achieve good audio performance and the methods for the adaptation of parameters will be discussed.
Paper 5302

Return to list of Sessions