AES New York 2015
Paper Session P15
P15 - Spatial Audio—Part 2
Saturday, October 31, 2:00 pm — 5:30 pm (Room 1A08)
Chair:
Filippo Maria Fazi, University of Southampton - Southampton, Hampshire, UK
P15-1 Capturing the Elevation Dependence of Interaural Time Difference with an Extension of the Spherical-Head Model—Rahulram Sridhar, 3D3A Lab, Princeton University - Princeton, NJ, USA; Edgar Choueiri, Princeton University - Princeton, NJ, USA
An extension of the spherical-head model (SHM) is developed to incorporate the elevation dependence observed in measured interaural time differences (ITDs). The model aims to address the inability of the SHM to capture this elevation dependence, thereby improving ITD estimation accuracy while retaining the simplicity of the SHM. To do so, the proposed model uses an elevation-dependent head radius that is individualized from anthropometry. Calculations of ITD for 12 listeners show that the proposed model is able to capture this elevation dependence and, for high frequencies and at large azimuths, yields a reduction in mean ITD error of up to 13 microseconds (3% of the measured ITD value), compared to the SHM. For low-frequency ITDs, this reduction is up to 160 microseconds (23%).
Convention Paper 9447 (Purchase now)
P15-2 Temporal Reliability of Subjectively Selected Head-Related Transfer Functions (HRTFs) in a Non-Eliminating Discrimination Task—Yunhao Wan, University of Florida - Gainesville, FL, USA; Ziqi Fan, University of Florida - Gainesville, FL, USA; Kyla McMullen, University of Florida - Gainesville, FL, USA
The emergence of commercial virtual reality devices has reinvigorated the need for research in realistic audio for virtual environments. Realistic virtual audio is often realized through the use of head-related transfer functions (HRTFs) that are costly to measure and individualistic to each listener, thus making their use unscalable. Subjective selection allows a listener to pick their own HRTF from a database of premeasured HRTFs. While this is a more scalable option further research is needed to examine listeners' consistency in choosing their own HRTFs. The present study extends the current subjective selection research by quantifying the reliability of subjectively selected HRTFs by 12 participants over time in a non-eliminating perceptual discrimination task.
Convention Paper 9448 (Purchase now)
P15-3 Plane-Wave Decomposition with Aliasing Cancellation for Binaural Sound Reproduction—David L. Alon, Ben-Gurion University of the Negev - Beer-Sheva, Israel; Jonathan Sheaffer, Ben-Gurion University of the Negev - Beer-Sheva, Israel; Boaz Rafaely, Ben-Gurion University of the Negev - Beer Sheva, Israel
Spherical microphone arrays are used for capturing three-dimensional sound fields, from which binaural signals can be obtained. Plane-wave decomposition of the sound field is typically employed in the first stage of the processing. However, with practical arrays the upper operating frequency is limited by spatial aliasing. In this paper a measure of plane-wave decomposition error is formulated to highlight the problem of spatial aliasing. A novel method for plane-wave decomposition at frequencies that are typically considered above the maximal operating frequency is then presented, based on the minimization of aliasing error. The mathematical analysis is complemented by a simulation study and by a preliminary listening experiment. Results show a clear perceptual improvement when aliasing-cancellation is applied to aliased binaural signals, indicating that the proposed method can be used to extend the bandwidth of binaural signals rendered from microphone array recordings.
Convention Paper 9449 (Purchase now)
P15-4 Modeling ITDs Based on Photographic Head Information—Jordan Juras, New York University - New York, NY, USA; Chris Miller, New York University - New York, NY, USA; Agnieszka Roginska, New York University - New York, NY, USA
Research has shown that personalized spatial cues used in 3D sound simulation lead to an improved perception and quality of the sound image. This paper introduces a simple method for photographically extracting the size of the head, and proposes a fitted spherical head model to more accurately predict Interaural Time Differences (ITD). Head-Related Impulse Responses (HRIR) were measured on eleven subjects, and ITDs were extracted from the measurements. Based on a photograph taken of each subject's face, the distance between the ears was measured and used to model a subject's personal ITDs. A head model is proposed that adjusts the spherical head model to more accurately model ITDs. Acoustic measurements of ITDs are then compared to the modeled ITDs demonstrating the effectiveness of the proposed method for photographically extracting personalized ITDs.
Convention Paper 9450 (Purchase now)
P15-5 Recalibration of Virtual Sound Localization Using Audiovisual Interactive Training—Xiaoli Zhong, South China University of Technology - Guangzhou, China; Jie Zhang, South China University of Technology - Guangzhou, China; Guangzheng Yu, South China University of Technology - Guangzhou, Guangdong, China
In virtual auditory display, non-individual head-related transfer functions (HRTF) of KEMAR result in localization degradation. This work investigates the efficacy of audiovisual interactive training as to the recalibration of such localization degradation. First, an audiovisual interactive training system consisting of control module, binaural virtual sound module, and vision module, was constructed. Then, ten subjects were divided into a control group and a training group, and underwent three-day training and localization tests. Results indicate that in the horizontal plane, the localization accuracy of azimuth is significantly improved with training and the front-back confusion is also reduced; however, in the median plane a three-day short-term training has no significant improvement on the localization accuracy of elevation.
Convention Paper 9451 (Purchase now)
P15-6 Analysis and Experiment on Summing Localization of Two Loudspeakers in the Median Plane—Bosun Xie, South China University of Technology - Guangzhou, China; Dan Rao, South China University of Technology - Guanzhou, Guangdong, China
Based on the hypothesis that the change of interaural time difference caused by head rotation and tilting provides dynamic cues for front-back and vertical localization, low-frequency localization equations or panning laws for multiple loudspeakers in the median plane were derived in our previous work. In present work we further supplement some psychoacoustic explanation of these equations and utilize them to analyze the summing localization of two loudspeakers with various configurations and pair-wise amplitude panning in the median plane. Relationship between current method and other localization theorems is also analyzed. Results indicate that for some configurations, pair-wise amplitude panning is able to create virtual sources between loudspeakers. However, it is unable to do so for some other loudspeaker configurations. A virtual source localization experiment yields consistent results with those of analysis, and therefore validates the proposed method.
Convention Paper 9452 (Purchase now)
P15-7 Immersive Audio Content Creation Using Mobile Devices and Ethernet AVB—Richard Foss, Rhodes University - Grahamstown, Eastern Cape, South Africa; Antoine Rouget, DSP4YOU Ltd. - Kowloon, Hong Kong
The goal of immersive sound systems is to localize multiple sound sources such that listeners are enveloped in sound. This paper describes an immersive sound system that allows for the creation of immersive sound content and real time control over sound source localization. It is a client/server system where the client is a mobile device. The server receives localization control messages from the client and uses an Ethernet AVB network to distribute appropriate mix levels to speakers with in-built signal processing.
Convention Paper 9453 (Purchase now)