AES London 2010
Paper Session P4
P4 - Spatial Signal Processing
Saturday, May 22, 14:00 — 18:00 (Room C3)
Chair: Francis Rumsey
P4-1 Classification of Time-Frequency Regions in Stereo Audio—Aki Härmä, Philips Research Europe - Eindhoven, The Netherlands
The paper is about classification of time-frequency (TF) regions in stereo audio data by the type of mixture the region represents. The detection of the type of mixing is necessary, for example, in source separation, upmixing, and audio manipulation applications. We propose a generic signal model and a method to classify the TF regions into six classes that are different combinations of central, panned, and uncorrelated sources. We give an overview of traditional techniques for comparing frequency-domain data and propose a new approach for classification that is based on measures specially trained for the six classes. The performance of the new measures is studied and demonstrated using synthetic and real audio data.
Convention Paper 7980 (Purchase now)
P4-2 A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments—Christopher Hummersone, Russell Mason, Tim Brookes, University of Surrey - Guildford, UK
Reverberation continues to be problematic in many areas of audio and speech processing, including source separation. The precedence effect is an important psychoacoustic tool utilized by humans to assist in localization by suppressing reflections arising from room boundaries. Numerous computational precedence models have been developed over the years and all suggest quite different strategies for handling reverberation. However, relatively little work has been done on incorporating precedence into source separation. This paper details a study comparing several computational precedence models and their impact on the performance of a baseline separation algorithm. The models are tested in a range of reverberant rooms and with a range of other mixture parameters. Large differences in the performance of the models are observed. The results show that a model based on interaural coherence produces the greatest performance gain over the baseline algorithm.
Convention Paper 7981 (Purchase now)
P4-3 Converting Stereo Microphone Signals Directly to MPEG-Surround—Christophe Tournery, Christof Faller, Illusonic LLC - Lausanne, Switzerland; Fabian Kuech, Jürgen Herre, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
We have previously proposed a way to use stereo microphones with spatial audio coding to record and code surround sound. In this paper we are describing further considerations and improvements needed to convert stereo microphone signals directly to MPEG Surround, i.e., a downmix signal plus a bit stream. It is described in detail how to obtain from the microphone channels the information needed for computing MPEG Surround spatial parameters and how to process the microphone signals to transform them to an MPEG Surround compatible downmix.
Convention Paper 7982 (Purchase now)
P4-4 Modification of Spatial Information in Coincident-Pair Recordings—Jeremy Wells, University of York - York, UK
A novel method is presented for modifying the spatial information contained in the output from a stereo coincident pair of microphones. The purpose of this method is to provide additional decorrelation of the audio at the left and right replay channels for sound arriving at the sides of a coincident pair but to retain the imaging accuracy for sounds arriving to the front or rear or where the entire sound field is highly correlated. Details of how this is achieved are given and results for different types of sound field are presented.
Convention Paper 7983 (Purchase now)
P4-5 Unitary Matrix Design for Diffuse Jot Reverberators—Fritz Menzer, Christof Faller, Ecole Polytechnique Federale de Lausanne - Lausanne, Switzerland
This paper presents different methods for designing unitary mixing matrices for Jot reverberators with a particular emphasis on cases where no early reflections are to be modeled. Possible applications include diffuse sound reverberators and decorrelators. The trade-off between effective mixing between channels and the number of multiply operations per channel and output sample is investigated as well as the relationship between the sparseness of powers of the mixing matrix and the sparseness of the impulse response.
Convention Paper 7984 (Purchase now)
P4-6 Sound Field Indicators for Hearing Activity and Reverberation Time Estimation in Hearing Instruments—Andreas P. Streich, ETH Zurich - Zurich, Switzerland; Manuela Feilner, Alfred Stirnemann, Phonak AG - Stäfa, Switzerland; Joachim M. Buhmann, ETH Zurich - Zurich, Switzerland
Sound field indicators (SFI) are proposed as a new feature set to estimate the hearing activity and reverberation time in hearing instruments. SFIs are based on physical measurements of the sound field. A variant thereof, called SFI short-time statistics SFIst2, is obtained by computing mean and standard deviations of SFIs on 10 subframes. To show the utility of these feature sets for the mentioned prediction tasks, experiments are carried out on artificially reverberated recordings of a large variety of sounds encountered in daily life. In a classification scenario where the hearing activity is to be predicted, both SFI and SFIst2 yield clearly superior accuracy even compared to hand-tailored features used in state-of-the-art hearing instruments. For regression on the reverberation time, the SFI-based features yield a lower residual error than standard feature sets and reach the performance of specially designed features. The hearing activity classification is mainly based on the average of the SFIs, while the standard deviation over sub-window is used heavily to predict the reverberation time.
Convention Paper 7985 (Purchase now)
P4-7 Stereo-to-Binaural Conversion Using Interaural Coherence Matching—Fritz Menzer, Christof Faller, Ecole Polytechnique Fédérale de Lausanne - Lausanne, Switzerland
In this paper a method of converting stereo recordings to simulated binaural recordings is presented. The stereo signal is separated into coherent and diffuse sound based on the assumption that the signal comes from a coincident symmetric microphone setup. The coherent part is reproduced using HRTFs and the diffuse part is reproduced using filters adapting the interaural coherence to the interaural coherence a binaural recording of diffuse sound would have.
Convention Paper 7986 (Purchase now)
P4-8 Linear Simulation of Spaced Microphone Arrays Using B-Format Recordings—Andreas Walther, Christof Faller, Ecole Polytechnique Federal de Lausanne - Lausanne, Switzerland
A novel approach for linear post-processing of B-Format recordings is presented. The goal is to simulate spaced microphone arrays by approximating and virtually recording the sound field at the position of each single microphone. The delays occurring in non-coincident recordings are simulated by translating an approximative plane wave representation of the sound field to the positions of the microphones. The directional responses of the spaced microphones are approximated by linear combination of the corresponding translated B-format channels.
Convention Paper 7987 (Purchase now)