AES San Francisco 2012
Paper Session P17

P17 - Spatial Audio Processing

Monday, October 29, 9:00 am — 1:00 pm (Room 121)

Chair:
Jean-Marc Jot, DTS, Inc. - Calabasas, CA, USA

P17-1 Comparing Separation Quality of Nonnegative Matrix Factorization and Nonnegative Matrix Factor 2-D Deconvolution in Audio Source Separation Tasks—Julian M. Becker, RWTH Aachen University - Aachen, Germany; Volker Gnann, RWTH Aachen University - Aachen, Germany
The Nonnegative Matrix Factorization (NMF) is widely used in audio source separation tasks. However, the separation quality of NMF varies a lot depending on the mixture. In this paper we analyze the use of NMF in source separation tasks and show how separation results can be significantly improved by using the Nonnegative Matrix Factor 2D Deconvolution (NMF2D). NMF2D was originally proposed as an extension to the NMF to circumvent the problem of grouping notes, but it is used differently in this paper to improve the separation quality, without taking the problem of grouping notes into account.
Convention Paper 8801 (Purchase now)

P17-2 Aspects of Microphone Array Source Separation Performance—Bjoern Erlach, Stanford University - Stanford, CA, USA; Rob Bullen, SoundScience P/L/ - Australia; Jonathan S. Abel, Stanford University - Stanford, CA, USA
The performance of a blind source separation system based on a custom microphone array is explored. The system prioritizes artifact-free processing over source separation effectiveness and extracts source signals using a quadratically constrained least-squares fit based on estimated source arrival directions. The level of additive noise present in extracted source signals is computed empirically for various numbers of microphones used and different degrees of uncertainty in knowledge of microphone locations. The results are presented in comparison to analytical predictions. The source signal estimate variance is roughly inversely proportional to the number of sensors and roughly proportional to both the additive noise variance and microphone position error variance. Beyond a threshold the advantages of increased channel count and precise knowledge of the sensor locations are outweighed by other limitations.
Convention Paper 8802 (Purchase now)

P17-3 A New Algorithm for Generating Realistic Three-Dimensional Reverberation Based on Image Sound Source Distribution in Consideration of Room Shape Complexity—Toshiki Hanyu, Nihon University - Funabashi, Chiba, Japan
A new algorithm for generating realistic three-dimensional reverberation based on statistical room acoustics is proposed. The author has clarified the relationship between reflected sound density and mean free path in consideration of room shape complexity [Hanyu et al., Acoust. Sci. & Tech. 33, 3 (2012),197–199]. Using this relationship the new algorithm can statistically create image sound source distributions that reflect the room shape complexity, room’s absorption, and room volume by using random numbers of three-dimensional orthogonal coordinates. The image sound source distribution represents characteristics of three-dimensional reverberation of the room. Details of this algorithm and how to apply this to multichannel audio, binaural audio, game sound, and so on are introduced.
Convention Paper 8803 (Purchase now)

P17-4 Audio Signal Decorrelation Based on Reciprocal-Maximal Length Sequence Filters and Its Applications to Spatial Sound—Bo-sun Xie, South China University of Technology - Guangzhou, China; Bei Shi, South China University of Technology - Guangzhou, China; Ning Xiang, Rensselaer Polytechnic Institute - Troy, NY, USA
An algorithm of audio signal decorrelation is proposed. The algorithm is based on a pair of all-pass filters whose responses match with a pair of reciprocal maximal length sequences (MLSs). Taking advantage of the characters of uniform power spectrum and low-valued cross-correlation of reciprocal MLSs, the filters create a pair of output signals with low cross-correlation but almost identical magnitude spectra to the input signal. The proposed algorithm is applied to broadening the auditory source-width and enhancing subjective envelopment in multichannel sound reproduction. Preliminary psychoacoustic experiments validate the performance of the proposed algorithm.
Convention Paper 8805 (Purchase now)

P17-5 Utilizing Instantaneous Direct-to-Reverberant Ratio in Parametric Spatial Audio Coding—Mikko-Ville Laitinen, Aalto University - Espoo, Finland; Ville Pulkki, Aalto University - Espoo, Finland
Scenarios with multiple simultaneous sources in an acoustically dry room may be challenging for parametric spatial sound reproduction techniques, such as directional audio coding (DirAC). It has been found that the decorrelation process used in the reproduction causes a perception of added reverberation. A model for DirAC reproduction is suggested in this paper, which utilizes estimation of instantaneous direct-to-reverberant ratio. The sound is divided into reverberant and non-reverberant parts using this ratio, and decorrelation is applied only for the reverberant part. The results of formal listening tests are presented that show that perceptually good audio quality can be obtained using this approach for both dry and reverberant scenarios.
Convention Paper 8804 (Purchase now)

P17-6 A Downmix Approach with Acoustic Shadow, Low Frequency Effects, and Loudness Control—Regis Rossi A. Faria, University of São Paulo - Ribeirão Preto, Brazil; NEAC - Audio Engineering and Coding Center - University of São Paulo - São Paulo, Brazil; José Augusto Mannis, University of Campinas - Campinas, Brazil
Conventional algorithms for converting 5.1 sound fields into 2.0 have systematically suppressed the low frequency information and neglected spatial auditory effects produced by the real position of the loudspeakers in 3/2/1 arrangements. We designed a downmix variation in which the listener's head acoustic shadow and the LFE information are considered in the conversion in order to investigate how the use of such models can contribute to improve the surround experience in two-channel modes and to customize the downmix conversion so to achieve the particular balance required by individual surround programs. A test implementation with integrated loudness control was carried out. Preliminary results show the potential benefits in using this approach and point to critical parameters involved in the downmixing task.
Convention Paper 8806 (Purchase now)

P17-7 Direct-Diffuse Decomposition of Multichannel Signals Using a System of Pairwise Correlations—Jeffrey Thompson, DTS, Inc. - Calabasas, CA, USA; Brandon Smith, DTS, Inc. - Calabasas, CA, USA; Aaron Warner, DTS, Inc. - Calabasas, CA, USA; Jean-Marc Jot, DTS, Inc. - Calabasas, CA, USA
Decomposing an arbitrary audio signal into direct and diffuse components is useful for applications such as spatial audio coding, spatial format conversion, binaural rendering, and spatial audio enhancement. This paper describes direct-diffuse decomposition methods for multichannel signals using a linear system of pairwise correlation estimates. The expected value of a correlation coefficient is analytically derived from a signal model with known direct and diffuse energy levels. It is shown that a linear system can be constructed from pairwise correlation coefficients to derive estimates of the Direct Energy Fraction (DEF) for each channel of a multichannel signal. Two direct-diffuse decomposition methods are described that utilize the DEF estimates within a time-frequency analysis-synthesis framework.
Convention Paper 8807 (Purchase now)

P17-8 A New Method of Multichannel Surround Sound Coding Utilizing Head Dynamic Cue—Qinghua Ye, Institute of Acoustics, Chinese Academy of Sciences - Beijing, China; Lingling Zhang, Beijing, China; Hefei Yang, Chinese Academy of Sciences - Beijing, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China
Considering the disadvantages of conventional matrix surround systems, a new method of multichannel surround sound coding is proposed. At the encoder, multichannel signals are converted into two pairs of virtual stereo signals, which are modulated to make corresponding changes by simulating dynamic cue of the head's slight rotation. Then the left and right channels of the virtual stereos are added respectively into two-channel stereo for transmission. At the decoder, the front and back stereos are separated by extracting the dynamic cue and then redistributed to the multichannel system. This new method can realize better surround sound reproduction without affecting sound effects of the downmixed stereo.
Convention Paper 8808 (Purchase now)

Return to Paper Sessions

EXHIBITION HOURS October 27th 10am – 6pm October 28th 10am – 6pm October 29th 10am – 4pm

REGISTRATION DESK October 25th 3pm – 7pm October 26th 8am – 6pm October 27th 8am – 6pm October 28th 8am – 6pm October 29th 8am – 4pm

TECHNICAL PROGRAM October 26th 9am – 7pm October 27th 9am – 7pm October 28th 9am – 7pm October 29th 9am – 5pm

Audio Engineering Society

AES San Francisco 2012Paper Session P17

P17 - Spatial Audio Processing

AES San Francisco 2012
Paper Session P17