AES Budapest 2012
Paper Session P15
P15 - Spatial Audio: Part 1
Saturday, April 28, 09:00 — 11:30 (Room: Lehar)
Chair:
Christof Faller
P15-1 Analysis on Error Caused by Multi-Scattering of Multiple Sound Sources in HRTF Measurement—Guangzheng Yu, Bosun Xie, Zewei Chen, Yu Liu, South China University of Technology - Guangzhou, China
A model consisting of two pulsating spherical sound sources and a rigid-spherical head is proposed to evaluate the error caused by multi-scattering of multiple sound sources in HRTF measurement. The results indicate that the ipsilateral error below 20 kHz caused by multi-scattering is within ± 1 dB when the radius of sound sources does not exceed 0.025 m, the source distance to head center is not less than 0.5 m, and the angular interval between the two adjacent sources is not less than 25 degrees. This accuracy basically satisfies the requirement of ipsilateral HRTF measurements. For improving the accuracy in contralateral HRTF measurement, some sound absorption treatments on the source surface are necessary.
Convention Paper 8643 (Purchase now)
P15-2 Personalization of Headphone Spatialization Based on the Relative Localization Error in an Auditory Gaming Interface—Aki Härmä, Ralph van Dinther, Thomas Svedström, Munhum Park, Jeroen Koppens, Philips Research Europe - Eindhoven, The Netherlands
In binaural sound reproduction applications using head-related transfer functions (HRTFs) it is beneficial that the properties of the HRTFs correspond to the personal characteristics of the real HRTFs of the user. In this paper we propose a method to choose HRTFs using a relative localization test. This allows us to make the selection of the best HRTFs using a simple auditory interface. It is possible to design the HRTF personalization interface in a consumer device as an auditory game where the task of the user is to place sound objects in relation to each other. Two different interfaces are compared in a listening test. The results of the tests reported in the current paper are mixed and do not give a conclusive picture on the performance of the proposed system, however, they do give interesting insights about the properties of binaural listening.
Convention Paper 8644 (Purchase now)
P15-3 Robustness of a Mixed-Order Ambisonics Microphone Array for Sound Field Reproduction—Marton Marschall, Sylvain Favrot, Technical University of Denmark - Lyngby, Denmark; Jörg Buchholz, National Acoustic Laboratories - Chatswood, Australia
Spherical microphone arrays can be used to capture and reproduce the spatial characteristics of acoustic scenes. A mixed-order Ambisonics (MOA) approach was recently proposed to improve the horizontal spatial resolution of microphone arrays with a given number of transducers. In this paper the performance and robustness of an MOA array to variations in microphone characteristics as well as self-noise was investigated. Two array processing strategies were evaluated. Results showed that the expected performance benefits of MOA are achieved at high frequencies, and that robustness to various errors was similar to that of HOA arrays with both strategies. The approach based on minimizing the error of the reproduced spherical harmonic functions showed better performance at high frequencies for the MOA layout.
Convention Paper 8645 (Purchase now)
P15-4 An Algorithm for Efficiently Synthesizing Multiple Near-Field Virtual Sources in Dynamic Virtual Auditory Display—Bosun Xie, Chengyun Zhang, South China University of Technology - Guangzhou, China
An algorithm for efficiently synthesizing multiple near-field virtual sources in dynamic virtual auditory display (VAD) is proposed. Applying the method of principal component analysis, a set of measured near-field head-related impulse responses (HRIRs) for KEMAR manikin at various source directions and distances are decomposed into a weighted sum of 15 time-domain basic functions along with a mean time-domain function, in which the time-independent weights represent the location dependence of HRIRs. Accordingly, multiple virtual sources synthesis at various locations is implemented by a common bank of 16 filters representing the time -domain basis functions and the mean time-domain function, in which the adjustable gains of the filters for each input stimulus (as well as an overall gain and delay for each input stimulus) control intended source locations. The computational cost of the proposed algorithm is reduced compared with that of conventional ones. Psychoacoustic experiments via a dynamic VAD with head-tracking validate the performance of the proposed algorithm.
Convention Paper 8646 (Purchase now)
P15-5 Scalable Coding of Three-Dimensional Multichannel Sound—Design of Conversion Matrix and Modeling of Unmasking Phenomenon—Akio Ando, NHK Science and Technical Research Laboratories - Setagaya, Tokyo, Japan, and Tokyo Institute of Technology, Meguro, Tokyo, Japan
We propose two methods for coding and transmitting three-dimensional multichannel sound signals: scalable coding and transmission, and modeling of the quantization error. The first method converts N-channel sound signals into M-channel basic signals and (N-M)-channel additional signals using a matrix operation. The matrix is trained by simulated annealing to minimize its condition number and the energy of additional signals. The unmasking artifact may occur when the N-channel signals are restored from the decoded signals using the inverse matrix. The second method estimates the quantization error signals by the polynomial approximation of the decoded signals. Experimental results showed that the combination of both methods could realize a 1.2 Mbps scalable transmission of 22-channel sounds without a notable sound degradation.
Convention Paper 8647 (Purchase now)