Monday, May 22, 09:30 — 12:30 (Gallery Window)
P16-01 MySofa—Design Your Personal HRTF
Christian Hoene (Presenting Author), Alexandru Cacerovschi (Author), Isabel C. Patino Mejia (Author)
Binaural auralizations are present in increasing numbers of applications and devices. Albeit most of the time they only use generic Head-Related Transfer Functions (HRTFs), the recent standardization of the HRTF format SOFA has paved the way to support individualized HRTFs broadly. We have developed and implemented MySofa: a web service that helps users to design a personal HRTF. In MySofa, based on anthropometric and user inputs, algorithms calculate and tune HRTFs. The result is displayed in the web browser and the user can listen to test renderings to verify, whether the personalized HRTF matches his expectations. In order to foster the use of individualized HRTF, we also implemented a light weight C-library called libmysofa, which helps programmers to read SOFA files and lookup FIR filters.
Convention Paper 9764
P16-02 Ecological Validity of Stereo UHJ Soundscape Reproduction
Francis Stevens (Presenting Author), Damian Murphy (Author), Stephen Smith (Author)
This paper contains the results of a study making use of a set of B-format soundscape recordings, presented in stereo UHJ format as part of an online listening test, in order to investigate the ecological validity of such a method of soundscape reproduction. Test participants were presented with a set of soundscapes and asked to rate them using the Self-Assessment Manikin (SAM), and these results were then compared with those from a previous study making use of the same soundscape recordings presented in a surround-sound listening environment (a method previously shown to be ecologically valid). Results show statistically significant correlation between the SAM results for both listening conditions, indicating that the use of stereo UHJ format is valid for soundscape reproduction.
Convention Paper 9765
P16-03 Comparison of HRTFs from a Dummy-Head Equipped with Hair, Cap, and Glasses in a Virtual Audio Listening Task over Equalized Headphones
György Wersényi (Presenting Author), József Répás (Author)
Head-Related Transfer Functions (HRTFs) are frequently used in virtual audio scene rendering in order to simulate sound sources at different spatial locations. The use of dummy-head HRTFs (also referred as generic sets) is often criticized because of poor localization performance, leading to, e.g., lower spatial resolution, in-the-head localization, front-back reversals, etc. This paper presents results of horizontal plane localization obtained by digital filter representations of dummy-head HRTFs that were recorded normally and using additional cap, glasses, and hair on the head. Results of untrained subjects over equalized reference headphones showed no significant difference among the HRTF sets despite of large magnitude differences. This method for customization of generic HRTFs fails if improvement in localization is needed.
Convention Paper 9766
P16-04 Filter Design of a Circular Loudspeaker Array Considering the Three-Dimensional Directivity Patterns Reproduced by Circular Harmonic Modes
Koya Sato (Presenting Author), Yoichi Haneda (Author)
We propose a filter design method for a circular loudspeaker array. This method is based on extended three-dimensional (3-D) bases that are observed by driving each circular harmonic mode using a prototypical circular loudspeaker array. When a desired 3-D directivity pattern is expanded by the extended 3-D bases, the filter coefficients can be obtained by combining the circular harmonics with the expansion coefficients of the desired 3-D directivity pattern. Moreover, the proposed method can suppress large filter gains at low frequencies by limiting the gain at each order (mode) using L1-norm optimization. We evaluated the performance of directivity and sound distortion using an actual 8-element circular loudspeaker array of radius 0.054 m. These results showed that the proposed method could control the 3-D directivity with little distortion.
Convention Paper 9767
P16-05 Wearable Sound Reproduction System Using Two End-Fire Arrays
Kenta Imaizumi (Presenting Author), Yoichi Haneda (Author)
We propose a personal sound reproduction system that uses two wearable end-fire loudspeaker arrays instead of a headphone to present the sound image. The prototype arrays rest on the listener’s chest so that the look direction of each array was directed to the listener’s ears. To prevent sound leakage around the listener, we designed a narrow directivity for each array. Moreover, we used a crosstalk canceler for localizing the sound image with head-related transfer functions. We verified the performance of the prototype by using simulations. A difference of approximately 15 dB of sound pressure was obtained between the look direction and the other directions. The crosstalk was suppressed from approximately 10 dB to 20 dB. Additionally, we also conducted a subjective listening test of the sound image localization. The right and left correct answer rate was approximately 90%, and the exact match was approximately 40%.
Convention Paper 9768
P16-06 Normalization Schemes in Ambisonic: Does it Matter?
Thibaut Carpentier (Presenting Author)
In the context of Ambisonic processing, various normalizations of the spherical harmonic functions have been proposed in the literature and there is yet no consensus in the community about which one should be preferred (if any). This is a frequent source of confusion for the end users and this may lead to compatibility issues between rendering engines. This paper reviews the different conventions in use, presents an extension of the FuMa scheme to higher orders, and discusses possible pitfalls in the decoding stage.
Convention Paper 9769
P16-07 Perceptually Motivated Amplitude Panning (PMAP) for Accurate Phantom Image Localization
Hyunkook Lee (Presenting Author)
This paper proposes and evaluates a new constant-power amplitude-panning law named “Perceptually Motivated Amplitude Panning (PMAP).” The method is based on novel image shift functions that were derived from previous psychoacoustic experiments. The PMAP is also optimized for a loudspeaker setup with an arbitrary base angle using a novel phantom image localization model. Listening tests conducted using various sound sources suggest that, for the 60° base angle, the PMAP provides a significantly better panning accuracy than the tangent law. For the 90° base angle, on the other hand, both panning methods perform equally good. The PMAP is considered to be useful for intelligent sound engineering applications, where an accurate matching between the target and perceived positions is important.
Convention Paper 9770
P16-08 Full-Sphere Binaural Sound Source Localization by Maximum-Likelihood Estimation of Interaural Parameters
Benjamin Hammond (Presenting Author), Philip J. B. Jackson (Author)
Binaural recording technology offers an inexpensive, portable solution for spatial audio capture. In this paper a full-sphere 2D localization method is proposed that utilizes the Model-Based Expectation-Maximization Source Separation and Localization system (MESSL). The localization model is trained using a full-sphere head related transfer function dataset and produces localization estimates by maximum-likelihood of frequency-dependent interaural parameters. The model’s robustness is assessed using matched and mismatched HRTF datasets between test and training data, with environmental sounds and speech. Results show that the majority of sounds are estimated correctly with the matched condition in low noise levels; for the mismatched condition, a “cone of confusion” arises with albeit effective estimation of lateral angles. Additionally, the results show a relationship between the spectral content of the test data and the performance of the proposed method.
Convention Paper 9771
P16-09 Spatial Quality and User Preference of Headphone Based Multichannel Audio Rendering Systems for Video Games: A Pilot Study
Joe Rees-Jones (Presenting Author), Damian Murphy (Author)
This paper presents a pilot experiment comparing the perceived spatial quality and preference of virtualized 7.0 surround-sound video game audio with a stereo down-mix of the same material. The benefits of multichannel audio in gaming are clear in that spatialized sound effects can be used to create immersive and dynamically reacting virtual environments, whilst also offering competitive advantages. However, results from this study suggest that the spatial quality of virtual 7.0 surround-sound is not perceived to be significantly different to that of a stereo down-mix and neither rendering method is preferred, based on a feedback from 18 participants. These results are interesting but surprising, as they bring into question the current methods used for spatial game audio presentation over headphones.
Convention Paper 9772