AES New York 2015
Poster Session P8
P8 - Signal Processing
Friday, October 30, 11:00 am — 12:30 pm (S-Foyer 1)
P8-1 Robust MPEG-4 High-Efficiency AAC With Fixed- and Variable-Length Soft-Decision Decoding—Sai Han, Technische Universität Braunschweig - Braunschweig, Germany; Tim Fingscheidt, Technische Universität Braunschweig - Braunschweig, Germany
MPEG-4 High-Efficiency advanced audio coding (HE-AAC) is optimized for low bit rate applications, such as digital radio broadcasting and wireless music streaming. In HE-AAC, the differential scale factors and quantized spectral coefficients are variable-length coded (VLC) by Huffman codes. The common reference value of the scale factors is a fixed-length coded global gain. Due to the error propagation in VLCs, a robust source decoder is desired for HE-AAC transmission over an error-prone channel. Unlike traditional hard-decision decoding or error concealment, soft-decision decoding utilizing bit-wise channel reliability information offers improved audio quality. In this work we apply soft-decision decoding of fixed length to the global gain and of variable length to the scale factors. Simulation results show a clearly improved performance.
Convention Paper 9399 (Purchase now)
P8-2 Extension of Monaural to Stereophonic Sound Based on Deep Neural Networks—Chan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seok Hee Jeong, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Su Yeon Park, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; City University of New York - New York, NY, USA
In this paper we propose a method of extending monaural into stereophonic sound based on deep neural networks (DNNs). First, it is assumed that monaural signals are the mid signals for the extended stereo signals. In addition, the residual signals are obtained by performing the linear prediction (LP) analysis. The LP coefficients of monaural signals are converted into the line spectral frequency (LSF) coefficients. After that, the LSF coefficients are taken as the DNN features, and the features of the side signals are estimated from those of the mid signals. The performance of the proposed method is evaluated using a log spectral distortion (LSD) measure and a multiple stimuli with a hidden reference and anchor (MUSHRA) test. It is shown from the performance comparison that the proposed method provides lower LSD and higher MUSHRA score than a conventional method using hidden Markov model (HMM).
Convention Paper 9400 (Purchase now)
P8-3 Nonnegative Tensor Factorization-Based Wind Noise Reduction—Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Ji Hyun Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seung Woo Yu, Gwangju Institute of Science and Technology - Gwangju, Korea; Young Han Lee, Korea Electronics Technology Institute (KETI) - Seongnam-si, Gyeonggi-do, Korea; Choong Sang Cho, KETI/Multimedia-IP - SeongNam-si, Gyeonggi-do, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea; City University of New York - New York, NY, USA
In this paper a wind noise reduction method based on nonnegative tensor factorization (NTF) is proposed to enhance the audio quality recorded using an outdoor multichannel microphone array. The proposed method first prepares learned bases for NTF by training exemplar blocks of spectral magnitudes for a series of wind noises and audio contents. Then, the spectral magnitudes of wind noise to be reduced are estimated from the exemplar blocks. Finally, a wind noise reduction multichannel filter is constructed based on a minimum mean squared error (MMSE) criterion and applied to the multichannel noisy signal to obtain the signal with reduced wind noise. The performance of the proposed method is compared with those of conventional methods using minimum statistics (MS) and nonnegative matrix factorization (NMF) for wind noise reduction. As a result, it is shown that the proposed method provides a higher signal-to-distortion ratio (SDR), signal-to-interference ratio (SIR), and signal-to-artifact ratio (SAR) than the conventional methods under various signal-to-noise ratio (SNR) conditions.
Convention Paper 9401 (Purchase now)
P8-4 Detection and Removal of the Birdies Artifact in Low Bit-Rate Audio—Simon Desrochers, Université de Sherbrooke - Sherbrooke, QC, Canada; Roch Lefebvre, Universite de Sherbrooke - Sherbrooke, QC, Canada
Audio signals compressed at low bit rates are known to generate audible artifacts that degrade perceptual quality. These different artifacts have been documented and solutions have been proposed by many authors to modify the internal mechanisms of codecs that cause these artifacts. In this paper we propose a post-processing approach to detecting and removing the birdies artifact by modeling spectral components as partials. This approach has the advantage of being compatible with any codec as it only requires the compressed signal. Formal listening tests have shown that this prototype algorithm can increase the perceptual quality of birdies-ridden signals. Furthermore, the explicit detection of this artifact could eventually be used in an objective perceptual quality assessment algorithm.
Convention Paper 9402 (Purchase now)
P8-5 Using Cascaded Global Optimization for Filter Bank Design in Low Delay Audio Coding—Stephan Preihs, Leibniz Universität Hannover - Hannover, Germany; Jörn Ostermann, Leibniz Universität Hannover - Hannover, Germany
This paper demonstrates the possibility of finding suitable design parameters for a filter bank optimization procedure by the use of global optimization techniques. After the Nayebi filter bank optimization algorithm is summarized and its degrees of freedom are described, a global optimization framework for the parameters involved in the design process is presented. It includes a cost function monitoring the frequency characteristics and reconstruction properties of different filter bank designs and allows for an automatic search of suitable design parameters for a given number of bands and taps and a predefined delay. The global optimization itself is done by means of well known methods like pattern search and the genetic algorithm. Experiments show that with our method manual parameter adjustment becomes obsolete. Furthermore with our proposed cascaded optimization, compared to manually adjusted designs, a gain of up to 10 dB in stopband attenuation can be achieved without loss in reconstruction quality.
Convention Paper 9403 (Purchase now)
P8-6 Effect of Reverberation on Overtone Correlations in Speech and Music—Sarah R. Smith, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
This paper explores the effect of reverberation on audio signals that possess a harmonically rich overtone spectrum such as speech and many musical instrument sounds. A proposed metric characterizes the degree of reverberation based upon the cross correlation of the instantaneous frequency tracks of the signal overtones. It is found that sounds that exhibit near perfect correlations in an anechoic acoustic environment become less correlated when passed through a reverberant channel. These results are demonstrated for a variety of music and speech tones using both natural recordings and synthetic reverberation. The proposed metric corresponds to the speech transmission index and thus may be employed as a quantitative measure of the amount of reverberation in a recording.
Convention Paper 9404 (Purchase now)
P8-7 Stacked Modulation in a Hall Reverberation Algorithm—Kelsey M. Cintorino, Roger Williams University - Bristol, RI, USA; Daniel M. Wisniewski, Roger Williams University - Bristol, RI, USA; Benjamin D. McPheron, Roger Williams University - Bristol, RI, USA
Reverberation is the reflection of sound caused by objects in space, similar to the way the visual world is sensed by the reflection of light. Novel reverberation algorithms are in high demand within the music industry due to changing trends and desire for unique sounds. As DSP hardware has improved, it is easier to implement multiple effects into the same algorithm. This paper presents a hall algorithm augmented with a series of chorus modulation blocks in an attempt to create new sounds. The approach is to add chorus blocks before the early decay phase of the hall algorithm as well as within the late reverb generation phase. The result is a stacked modulation reverberation algorithm.
Convention Paper 9405 (Purchase now)
P8-8 Efficient Multi-Band Digital Audio Graphic Equalizer with Accurate Frequency Response Control—Richard J. Oliver, DTS, Inc. - Santa Ana, CA, USA; Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA
Graphic equalizers give listeners an intuitive way to modify the frequency response of an audio signal—simply set the sliders to visually represent the desired curve and the corresponding shape of audio filter frequency response will be invoked. At least, that is the implied promise of the technology. However, the actual measured response of the equalizer can reveal some surprises. Filter inaccuracy, boost/cut asymmetry and unexpected nulls can disappoint both the eye and the ear. An equalizer design is presented that uses efficient IIR filter sections tuned with a closed form algorithm to give an accurate and intuitive frequency response with low complexity and minimal processing overhead. Design parameters and implementation details are discussed.
Convention Paper 9406 (Purchase now)