Events

2018 AES International Conference on Spatial Reproduction — Aesthetics and Science

Programs: Papers

| Home | Call for Contribution | Program | Registration | Venue | Travel | Sponsors | Committee Members |
Spatial Audio

 

Program: Papers / Posters / e-Brief

P1: Binaural-1 [Day 1 (Aug 7) 10:30–12:10]

Chair: Brian Katz

[P1-1]
Title:
Evaluation of Robustness of Dynamic Crosstalk Cancellation for Binaural Reproduction
Authors:
Ryo Matsuda (Kyoto University), Makoto Otani (Kyoto University), and Hiraku Okumura (Yamaha Corporation / Kyoto University)
Abstract:
Binaural reproduction/synthesis is one of promising ways to present spatial auditory space to a listener. Generally, binaural signals can be reproduced by a set of headphones. An alternative way for presentation of binaural signals is to use loudspeakers without wearing any equipment like headphones by employing so called crosstalk cancellation (CTC). The authors have been developing dynamic CTC system using head tracking and real-time filter generation which allows the listener to move and rotate his/her head freely. However, optimal positions of loudspeakers for dynamic CTC for a single listener system have not been clarified. In this paper, we use 122ch loudspeaker array to evaluate impacts of loudspeaker arrangement on robustness of CTC system.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19598
[P1-2]
Title:
Optimization and prediction of the spherical and ellipsoidal ITD model parameters using offset ears
Authors:
Helene Bahu (Dysonics) and David Romblom (Dysonics)
Abstract:
In the absence of measurements, geometric head models offer a simple way to predict the ITD from a small number of parameters that can ideally be related to morphological quantities. This paper compares the ITD predictions of the spherical and ellipsoidal head models with offset ears to the measured ITD of thirty seven subjects. The model parameters are first individually optimized and then regression analysis is conducted to predict these parameters from anthropometric measurements. Lower global errors are obtained when the interaural axis is no longer constrained to intersect the center of the sphere or ellipsoid. The global error metrics of both models are very similar but differ in their spatial distributions.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19599
[P1-3]
Title:
The effects of dynamic and spectral cue on auditory vertical localization
Authors:
Jianliang Jiang (South China University of Technology), Bousn Xie (Acoustic Lab., School of Physics and Optoeletronics), Haiming Mai (South China University of Technology), Lulu Liu (South China University of Technology), Kailing Yi (South China University of Technology), and Chengyun Zhang (School of Mechanical & Electrical Engineering, Guangzhou University)
Abstract:
A series of virtual source localization experiment was carried out to examine the effects of dynamic and spectral cue on auditory vertical localization. Binaural signals were synthesized and rendered via a virtual auditory display under various combinations of conditions, including dynamic/static binaural synthesis, individualized /non-individualized /pinna-less HRTFs, and stimuli with different bandwidths. The statistics on experimental results indicate that dynamic cue in binaural signals enables vertical localization moderately even when the spectral cue at high frequency is completely eliminated. A coordination of dynamic and spectral cue further improves vertical localization, reducing up-down confusion and polar elevation error apparently. Therefore, in addition to high-frequency spectral cue, dynamic cue caused by head turning contributes to vertical auditory localization.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19600
[P1-4]
Title:
Detection of a nearby wall in a virtual echolocation scenario based on measured and simulated OBRIRs
Author:
Annika Neidhardt (Technische Universität Ilmenau)
Abstract:
The paper presents a study on the perception of a nearby wall considering a speaking avatar in virtual acoustic environments (VAEs) for headphone reproduction. The avatar represents the user in the virtual world and can be controlled by authentic tracked movements. The only sound source is the avatar's mouth. The scenario is realized with dynamic binaural synthesis based on oral binaural room impulse responses (OBRIRs). A psychoacoustic experiment was conducted to test for audible cues of the nearby virtual wall at different distances in different rooms. Measured as well as simulated OBRIRs were taken into account. The results provide encouraging information to consider including echolocation aspects in the design of interactive virtual acoustic environments.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19601

P2: Binaural-2 [Day 1 (Aug 7) 13:10–14:00]

Chair: Jorge Treviño

[P2-1]
Title:
Dataset of near-distance head-related transfer functions calculated using the boundary element method
Authors:
César Salvador (Tohoku University), Shuichi Sakamoto (Tohoku University), Jorge Treviño (Tohoku University), and Yôiti Suzuki (Tohoku University)
Abstract:
The head-related transfer functions (HRTFs) are essential filters used in three-dimensional audio playback devices for personal use. They describe the transmission of sound from a point in space to the eardrums of a listener. Obtaining near-distance HRTFs is important to increase the realism when presenting sounds in the peripersonal space. However, most of the publicly available datasets provide HRTFs for sound sources beyond one meter. We introduce a collection of generic and individual HRTFs for circular and spherical distributions of point sources at distances ranging from 10 cm to 100 cm, spaced 1 cm. The HRTF datasets were calculated using the boundary element method. A sample of the dataset is publicly available in the spatially oriented format for acoustics (SOFA).
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19602
[P2-2]
Title:
Estimation of the spectral notch frequency of the individual early head-related transfer function in the upper median plane based on the anthropometry of the listener's pinnae
Authors:
Kazuhiro Iida (Chiba Institute of Technology), Masato Oota (Chiba Institute of Technology), and Hikaru Shimazaki (Chiba Institute of Technology)
Abstract:
In order to address the individual differences in the HRTFs of different listeners, the individualization of the HRTF based on the anthropometry of the listener's pinnae has been investigated. Using the early HRIR (Iida and Oota 2018), which is mainly composed of the response of the pinna, instead of the usual HRIR, the estimation of the notches and peaks based on the anthropometry of the listener's pinna is expected to be achieved in the entire median plane. The present study estimated the frequency of the N1 (the lowest frequency notch) of the listener's early HRTFs, of which duration is 1ms, for seven vertical angles in the upper median plane from the anthropometry of the listener's pinnae.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19603

P3: Psychoacoustics-1 [Day 1 (Aug 7) 16:00–17:15]

Chair: William Martens

[P3-1]
Title:
Evaluation of the frequency dependent influence of direct sound on the effect of sound projection
Authors:
Tom Wühle (TU Dresden), Sebastian Merchel (TU Dresden), and Ercan Altinsoy (TU Dresden)
Abstract:
Audio reproduction systems with sound projection create virtual sources by projecting sound on reflective boundaries using highly focusing real sound sources. The aim of this reproduction approach is to generate auditory events located in the direction of the virtual sources rather than the direction of the real sources. However, the focusing capabilities of real sources, e.g., loudspeaker arrays, are physically limited. This causes, that in practice the perception of the listener is not only influenced by the projected sound, but also by sound which is directly radiated from the real source. The present study investigated the frequency dependent influence of the direct sound on the effect of sound projection using band limited noise bursts and a method of adjustment procedure.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19604
[P3-2]
Title:
Influence of the frequency dependent directivity of a sound projector on the localization of projected sound
Authors:
Sebastian Merchel (TU Dresden), Tom Wühle (TU Dresden), Felix Reichmann (TU Dresden), and Ercan Altinsoy (TU Dresden)
Abstract:
This study investigates the localization in a sound projection scenario. Real sound projectors, e.g., phased arrays, are characterized by a physically limited focusing ability. This leads to multiple sound propagation paths to the listener: projected sound via reflections and direct sound via the shortest path. The spectral characteristic of the direct sound is varied in a listening test to simulate realistic directivity limitations at low and high frequencies. The attenuation of the direct sound at mid-range frequencies, which is just necessary so that the position of the auditory event is localized in the direction of the projection, is determined.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19605
[P3-3]
Title:
A quantitative evaluation of media device orchestration for immersive spatial audio reproduction
Authors:
James Woodcock (University of Salford), Jon Francombe (BBC Research and Development), Richard Hughes (University of Salford), Russell Mason (University of Surrey), William J. Davies (University of Salford), and Trevor J. Cox (University of Salford)
Abstract:
Media device orchestration (MDO) is the concept of using an ad hoc array of devices to deliver or augment a media experience. This paper describes a multiple stimulus rating experiment designed to evaluate the overall quality of listening experience (QoLE) of a range of common home audio systems (mono, two-channel, and five-channel) and two audio MDO systems. Twenty-six experienced listeners performed the experiment in two listening positions (on and off sweet spot) for three program items. Kruskal-Wallis tests revealed significant differences in QoLE between the evaluated systems. Post-hoc Wilcoxon signed-rank tests suggested that MDO can perform similarly to two- and five-channel systems in the sweet spot listening position, and improve the QoLE off sweet spot.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19606

P4: Psychoacoustics-2 [Day 1 (Aug 7) 17:20–18:30]

Chair: Atsushi Marui

[P4-1]
Title:
Energy Aware Modelling of Inter-Aural Level Difference Distortion Impact on Spatial Audio Perception
Authors:
Pablo Delgado (International Audio Laboratories Erlangen), Jürgen Herre (Fraunhofer), Armin Taghipour (Fraunhofer), and Nadja Schinkel-Bielefeld (Fraunhofer)
Abstract:
In spatial audio processing, Inter-aural Level Difference Distortions (ILDD) between reference and coded signals play an important role in the perception of quality. To reduce costs, algorithms that predict the perceptual quality of multichannel/spatial audio processing without requiring extensive listening tests are developed. The correct modelling of the perceived ILDD greatly influences the prediction performance. We propose a model of ILDD perception including an energy dependency in different spectral regions of the signal. Model parameters are fitted to subjective results obtained from listening test data over a synthetic audio database with arbitrarily induced ILDD at different intensities, frequencies and energy levels. Finally, we compare our model performance over two databases of real coded signals with two state-of-the-art ILDD models.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19634
[P4-2]
Title:
The effect of characteristics of sound source on the perceived differences in the number of playback channels of multi-channel audio reproduction system
Authors:
Misaki Hasuo (WOWOW Inc.), Toru Kamekawa (Tokyo University of the Arts), and Atsushi Marui (Tokyo University of the Arts)
Abstract:
The effect of the sound source characteristics for the impression of multichannel sound systems were examined through two experiments tested whether the difference in the number of channels can be significantly perceived. The results of experiment-1, concluded that the number of channel combinations with a low percentage of correct answers was the largest in a xylophone single tone, and the smallest in a cello phrase. In experiment-2, we focused on the harmonic structure of the sound source, and experimented using saw-waves with a different number of overtones and a pink-noise. As a result, it was found that the temporal change of reflection sound energy on the lateral and above, and time variation of IACC, affect the channel number discrimination.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19607
[P4-3]
Title:
Discrimination of auditory spatial diffuseness facilitated by head rolling while listening to ‘with-height’ versus ‘without-height’ multichannel loudspeaker reproduction
Author:
William Martens (The University of Sydney) and Yutong Han (The University of Sydney)
Abstract:
Listeners were asked to compare auditory spatial diffuseness between multichannel programs that were presented via three loudspeaker configurations, two of which included 'height channels' and one that was 'without-height' (the 'without-height' array employed loudspeakers located only on a single plane that was level with the listener's ears). In a double-blind, two-alternative forced choice (2AFC) task, listeners it was found that listeners could discriminate a 'without-height' decrease in auditory spatial diffuseness to a significant extent when they rolled their heads from side to side. Those same listeners were unsuccessful in discriminating auditory spatial diffuseness when their heads were held stationary. Furthermore, when listeners pitched their heads up and down, no clear evidence for elevation-related diffuseness discrimination could be found.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19608

Poster-1 (Full Paper) [Day 1 (Aug 7) 14:15–15:45]

Chair: Akira Omoto

[PP-1]
Title:
Challenges of distributed real-time finite-difference time-domain room acoustic simulation for auralization
Authors:
Jukka Saarelma (Aalto University), Jonathan Califa (Oculus VR, Facebook), and Ravish Mehra (Oculus VR, Facebook)
Abstract:
Large-scale wave-based room acoustic simulations are commonly considered computationally expensive. Distributed computing can be used to accommodate such methods for wide-bandwidth simulation, but may pose several challenges for real-time execution, such as communication and synchronization latencies between the computing hardware. In this paper, a system for real-time auralization of distributed finite-difference time-domain room acoustic simulation is proposed. Several experiments are conducted to measure the performance of the system. A proof-of-concept application where simulated responses from the distributed system are streamed to a virtual reality scene and auralized is constructed. It is found that distributed computing scales well with large off-line simulations, but synchronization latencies become a limiting factor in the implementation of a real-time system.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19609
[PP-2]
Title:
The Number of Virtual Loudspeakers and the Error for Spherical Microphone Array Recording and Binaural Rendering
Authors:
Jianliang Jiang (South China University of Technology), Bousn Xie (Acoustic Lab., School of Physics and Optoeletronics), and Haiming Mai (South China University of Technology)
Abstract:
Spherical microphone array recording and binaural rendering (SMARBR) is a spatial audio technique. It is implemented by firstly using Ambisonics (beamforming) scheme to decode the outputs of microphone array to the loudspeaker signals, and then the loudspeaker signals are further transferred into binaural signals via virtual loudspeaker scheme. The present work evaluates the influence of number of loudspeakers on the error in binaural pressures for SMARBR. The results indicate that, for a given order of Ambisonics scheme, when the number of virtual loudspeakers exceeds the minimal requirement, on average, increasing number of virtual loudspeakers appropriately reduces the mean error around and above the high frequency limit imposed by spatial sampling theorem.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19610
[PP-3]
Title:
Effect of individualized head-related transfer functions on distance perception in virtual reproduction for a nearby source
Author:
Liliang Wang (South China University of Technology) and Guangzheng Yu (South China University of Technology)
Abstract:
Head-related transfer function (HRTF) is affected by anatomic parameters of an individual subject, and thus it is worthy noticing in virtual reproduction based on the HRTF signal processing. This work aims to find out the effect of individual cue of HRTF on auditory distance perception for a nearby sound source by means of psychoacoustic experiments. The individualized HRTFs and non-individualized HRTFs of 6 subjects, at 7 distances and 5 lateral azimuths, were convolved with white noise and used as stimulus signals. The results of psychoacoustic experiments indicate that the individual cue of HRTF has non-significant effect on auditory distance perception of nearby sources, and thus it could be unworthy of much concern in virtual reproduction for most of the occasions.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19611
[PP-4]
Title:
Controlling The Apparent Source Size In Ambisonics Using Decorrelation Filters
Author:
Timothy Schmele (Eurecat) and Umut Sayin (Eurecat)
Abstract:
The apparent source extent, defined as the perceived sound source size, is a common parameter for 3D audio applications, such as VR, 3D cinema and spatialized music composition. Here, we present a new method for achieving this effect for ambisonics, which targets the minimization of the interaural cross-correlation coefficient. It achieves this by reducing the ambisonic order and rotating decorrelated copies of the original stream into an optimal configuration. Comparisons to older methods as well as different filter parameters and parameter curves are discussed for both binaural and multichannel speaker systems. Results indicate that this method is able to outperform previous ones, particularly in the gradual change in source size and listening area stability.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19612
[PP-5]
Title:
Sound field renderer with loudspeaker array using real-time convolver
Authors:
Takao Tsuchiya (Doshisha Univ.), Issei Takenuki (Doshisha Univ.), and Kyousuke Sugiura (Doshisha Univ.)
Abstract:
In this study, a sound field renderer which can create the virtual sound field by input of the real-world sound is developed. The system consists of 24 channels loudspeaker array, 4 channels microphones, and a real-time convolver. The audio signals received at microphones are convoluted in real time with the impulse responses calculated by the sound field rendering. The rendering model of Yamaha hall in Tokyo is made. The calculated reverberation curve and the measured one show good agreement. The delay time of the system was evaluated as 7.8 ms, which may not affect the perception. It is shown that the renderer is possible to produce the virtual sound field in high presence.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19613
[PP-6]
Title:
Proposal of a sound source separation method using image signal processing of a spatio-temporal sound pressure distribution image
Authors:
Kenji Ozawa (University of Yamanashi), Masaaki Ito (University of Yamanashi), Genya Shimizu (University of Yamanashi), Masanori Morise (University of Yamanashi), and Shuichi Sakamoto (Tohoku University)
Abstract:
This paper proposes a sound source separation method using a microphone array and image signal processing. First, a spatio-temporal sound pressure distribution (STSPD) image is formed based on microphone outputs. Two-dimensional fast Fourier transform (2D FFT) transforms this image into a spectrum, in which sounds from different directions are separated into the components on different lines naturally. To separate sound sources, every line in the spectrum is extracted and 2D inverse FFT is applied. A method to restore a fine STSPD image from the sparse-microphone array is also proposed. Although the basic performance of the proposed method is comparable to a conventional delay and sum array, methods that are more sophisticated can be applied for improved performance.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19614

P5: Game Audio [Day 2 (Aug 8) 10:00–12:05]

Chair: Joel Douek

[P5-1]
Title:
Personality and listening sensitivity correlates of the subjective response to real and simulated sound sources in virtual reality
Authors:
J. William Bailey (University of Salford) and Bruno Miguel Fazenda (University of Salford)
Abstract:
Some personality traits correlate with presence in virtual reality. However, it is not clear whether this is true for externalisation of spatial audio. The aim of this paper is to investigate if such relationships exists by comparing the evaluation of the spatial quality of audio in a virtual reality based externalisation test with personality and cognitive factors. It was found that when audio was perceived as a real source, aural sensitivity was correlated with externalisation. When audio was rated as originating from headphones, immersive tendencies were associated awareness of listening to headphones. The results suggest that psychological aspects which are related to presence are not the same as those which are associated with the experience of perceiving externalisation.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19615
[P5-2]
Title:
A Low Cost Method for Applying Acoustic Features to Each Sound in Virtual 3D Space Using Layered Quadtree Grids
Authors:
Kenji Kojima (CAPCOM Co., Ltd.) and Takahiro Kitagawa (CAPCOM Co., Ltd.)
Abstract:
In order to use spatial audio in virtual 3D spaces, such as video games, it is important to take into account the positions of the sound sources and listener. Particularly, for object-based audio, acoustic features must be applied to each source individually. At this time, processing unnecessary sources wastes CPU resources. This paper proposes low cost methods for determining the sound propagation paths using Layered Quadtree Grids and enabling sound designers to design more realistic sounds by using audio buses and sound effects. By focusing on space and openings, it shows how to dynamically reduce processing for unnecessary sources and determine propagation paths. Experiments show this method processes each source more than ten times faster than the conventional method.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19616
[P5-3]
Title:
Proximity representation of VR world — Realization of the real-time binaural for video games
Author:
Kentaro Nakashima (CAPCOM Co., Ltd.)
Abstract:
Using 3D-audio plugin is by far the most common technique to achieve stereophonic sound of headphones in video game development. However, the method tends to make the audio sources sound far and off then its actual distance. We consider that reproducing closer sound is particularly important in game production, and to achieve it, we developed a real-time binaural method of stereophonic sound that realizes convincing close sound. We also created a prototype to evaluate the method by conducting an assessment test on 30 people. Thereby, we have fixed the issue and improved the immersion of 3D-audio sound.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19617
[P5-4]
Title:
Impact of HRTF individualization on player performance in a VR shooter game
Authors:
David Poirier-Quinot (Sorbonne Université, CNRS, Institut Jean Le Rond d'Alembert) and Brian Katz (Sorbonne Université, CNRS, Institut Jean Le Rond d'Alembert)
Abstract:
This study assessed the impact of individualized binaural rendering on player performance in the context of a VR “shooter game” as part of a larger project to characterize the impact of binaural rendering quality in various VR applications. Participants played a game in which they were faced with successive enemy targets approaching from random directions on a sphere. Audio-visual cues allowed for target location. Participants were equipped with an Oculus CV1-HMD, headphones, and two Oculus Touch hand tracked devices as targeting mechanisms. Participants performed two sessions, using once their best and worst-match HRTF from a “perceptually orthogonal” optimized set of 7 HRTFs [Katz 2012]. Results indicate significant performance improvement (speed and movement efficiency) with best-match HRTF binaural rendering sessions.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19618
[P5-5]
Title:
The Effect of Virtual Environments on Localization during a 3D Audio Production Task
Authors:
Matthew Boerum (McGill University), Jack Kelly (McGill University), Diego Quiroz (McGill University), and Patrick Cheiban (McGill University)
Abstract:
In a perceptual study of three-dimensional audio for virtual reality (VR), an experiment was conducted to investigate the influence of virtual environments on localization during a production-based panning task. It was hypothesized that performing this task in VR would take longer and be less accurate than if conducted in a real environment. Using a 3D loudspeaker array and hardware panning controls, 80 participants were asked to repeatedly match probe sounds to the location of a randomly positioned target sound. Participants performed the task with and without awareness of loudspeaker position. Half were presented a VR replica of the environment using a head mounted display. Results showed that virtual reality did not significantly inhibit task performance.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19619

P6: Psychoacoustics-3 [Day 2 (Aug 8) 14:45–15:35]

Chair: Kenji Ozawa

[P6-1]
Title:
Recognition of an auditory environment: investigating room-induced influences on immersive experience
Authors:
Sungyoung Kim (Rochester Institute of Technology), Richard King (McGill University), Toru Kamekawa (Tokyo University of the Arts), and Shuichi Sakamoto (Tohoku University)
Abstract:
Four distinct multi-channel playback rooms were compared in this study. In each room, we played three classical music pieces mixed for 22.2 and 2 channels, respectively, and generated corresponding binaural recordings. Fifteen trained listeners participated in a listening experiment where they compared four rooms on five salient attributes—perceived width of, perceived depth of, spatial clarity of, impression being enveloped by, and timbral balance of the sound field. A factor analysis shows that two factors accounted for about 70% of the total variance. The first factor is associated with spatial dimension of a sound field and the second factor with spectral and spatial integration (or blend) of sound components.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19620
[P6-2]
Title:
On Human Perceptual Bandwidth and Slow Listening
Author:
Thomas Lund (Genelec OY) and Aki Mäkivirta (Genelec OY)
Abstract:
The brain has ever only learned about the world through our five primary senses. With them, we just receive a fraction of the information actually available, while we perceive far less still. A fraction of a fraction: The perceptual bandwidth. Conscious perception is also profoundly influenced by long-term experience and learning, to an extent that perception might be more accurately understood and studied as primarily an inside-to-out phenomena. Summarizing a review of physiological, clinical and psychological research, the paper proposes three types of listening strategies we should distinguish between when conducting subjective tests: Easy listening, trained listening and slow listening.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19621

P7: Ambisonics-1 [Day 2 (Aug 8) 15:45–17:00]

Chair: Shuichi Sakamoto

[P7-1]
Title:
Directional filtering of Ambisonic sound scenes
Authors:
Pierre Lecomte (Centre Lyonnais d'Acoustique), Philippe-Aubert Gauthier (Université de Sherbrooke), Alain Berry (Université de Sherbrooke), Alexandre Garcia (Conservatoire National des Arts et Métiers), and Christophe Langrenne (Conservatoire National des Arts et Métiers)
Abstract:
This paper presents an approach for the directional filtering in an Ambisonic sound scene. The filtering is described in the angular domain and is done in the Ambisonic domain thanks to a spherical Fourier transform. When dealing with finite-order Ambisonics, this operation can be expressed as a matrix operation. Two directional filtering cases are discussed: the directional Dirac and the case of beampattern with known spherical Fourier transform. Theoretical analysis is provided: It is shown that the required order after directional filtering should increase to avoid the loss of information. An implementation is introduced: It allows variable selectivity of the directional filtering along with real-time interaction.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19622
[P7-2]
Title:
Spatialization Pipeline for Digital Audio Workstations with Three Spatialization Techniques
Authors:
Tanner Upthegrove (Virginia Tech), Charles Nichols (Virginia Tech), and Michael Roan (Virginia Tech)
Abstract:
As more artists find utility in spatial audio for producing live and recorded music, new tools are needed to meet the aesthetic goals of each artist. This paper proposes a method for controlling and mixing point-source spatial audio with the familiar interface of Cockos Reaper, a digital audio workstation (DAW), which controls a Cycling '74 Max Patch rendering Vector Base Amplitude Panning (VBAP), High Order Ambisonics (HOA), and 3-D convolution reverberation mixing busses. The Max patch allows mixes in Reaper to translate to any arbitrary loudspeaker array by modification of the quantity and location of loudspeakers.In spherical harmonic sound field synthesis, spatial discretization of the sphere and truncation order decrease the physical performance in terms of acoustical pressure reproduction error. A simple rule allows for the prediction of the listening area's size so that the deviation between the target and reconstructed sound pressure fields stays below a given level. However, this rule does not quantify the area of effective reproduction in terms of directional fields such as particle velocity or acoustical intensity. In this paper, a new rule for the effective reconstruction of particle velocity, acoustic intensity, and related directional metrics is derived and is found to be more severe than the usual pressure reconstruction. Numerical simulation of the directional fields and metrics are reported.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19623
[P7-3]
Title:
Size of the accurate-reproduction area as function of directional metrics in spherical harmonic domain
Authors:
Pierre Grandjean (Université de Sherbrooke), Philippe-Aubert Gauthier (Université de Sherbrooke), and Alain Berry (Université de Sherbrooke)
Abstract:
In spherical harmonic sound field synthesis, spatial discretization of the sphere and truncation order decrease the physical performance in terms of acoustical pressure reproduction error. A simple rule allows for the prediction of the listening area's size so that the deviation between the target and reconstructed sound pressure fields stays below a given level. However, this rule does not quantify the area of effective reproduction in terms of directional fields such as particle velocity or acoustical intensity. In this paper, a new rule for the effective reconstruction of particle velocity, acoustic intensity, and related directional metrics is derived and is found to be more severe than the usual pressure reconstruction. Numerical simulation of the directional fields and metrics are reported.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19624

P8: Ambisonics-2 VR [Day 2 (Aug 8) 17:10–18:00]

Chair: Hyunkook Lee

[P8-1]
Title:
Extending the listening region of high-order Ambisonics through series acceleration
Authors:
Jorge Treviño (Tohoku University), Shuichi Sakamoto (Tohoku University), and Yôiti Suzuki (Tohoku University)
Abstract:
Sound field reproduction using high-order Ambisonics (HOA) is limited to a small listening region. This is a consequence of HOA relying on a loudspeaker array to re-create, term-by-term, the lower orders of the sound field's multipole expansion. However, results from perturbation theory show that asymptotic series can be transformed into new series with faster convergence; this is known as series acceleration. This study considers the use of series acceleration to improve the accuracy of sound field reproduction away from the origin. This leads to a larger listening region compared to conventional HOA reproduction. The improvement, as well as the robustness, of the proposal are evaluated through numerical experiments which simulate arrays of microphones and loudspeakers.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19625
[P8-2]
Title:
Psychophysical validation of binaurally processed sound superimposed upon environmental sound via an unobstructed pinna and an open-ear-canal earspeaker
Author:
Frederico Pereira (The University of Sydney) and William Martens (The University of Sydney)
Abstract:
In most headphone-based spatial auditory display applications that use Head Related Transfer Functions (HRTFs), the presence of the headphone interferes with the listener's natural spatial hearing abilities with regard to the “live” sound within their immediate environment. This paper describes an application of binaural technology in which a novel earspeaker reproduction system delivers sound at the ear canal entrance while exhibiting minimal acoustical interference with listener's Anatomical Transfer Function (ATF). The system enables the creation of an effective auditory augmented reality for the listener in which virtual sound sources are displayed via HRTF-based processing, and the resulting binaural images are superimposed upon environmental sounds that are naturally transformed by the ATF of the listener.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19626

Poster-2 (eB) [Day 2 (Aug 8) 13:15–14:30]

[EB1-1]
Title:
Acoustic characteristics of headphones and earphones
Authors:
Yoshiharu Soeta (National Institute of Advanced Industrial Science and Technology) and Sho Ooi (National Institute of Advanced Industrial Science and Technology)
Abstract:
Comparison of acoustic characteristics of headphones and earphones is a major concern during design or product selection. There are several specifications for the acoustic characteristics, such as frequency response, impedance, sensitivity, and so on. They rarely provide hints for a comparison between devices. In this study, we compared headphones and earphones by acoustical measurements to find the acoustical factors that explain the characteristics of headphones and earphones. Impulse response, correlation factors, and psychoacoustic factors were analyzed. The difference among headphones and earphones were represented by early to late energy ratio, magnitude of the interaural cross-correlation function, and so on.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19641
[EB1-2]
Title:
Tetrahedral Microphone: A Versatile “Spot” and Ambience Receiver for 3D Mixing and Sound Design
Authors:
Wieslaw Woszczyk (McGill University), Jennifer Nulsen (McGill University), Ephraim Hahn (McGill University), Haruka Nagata (McGill University), Kseniya Degtyareva (McGill University), and John Castillo (McGill University)
Abstract:
With the introduction of affordably priced high quality 3D microphones, such as Sennheiser's Ambeo VR, it becomes interesting to consider whether to use such 3D receivers as support microphones and as dedicated extractors of ambience. The precise tetrahedral orientation of the four-cardioid capsules in such microphone gives it a spatial spreading advantage unavailable in a single-capsule spot microphone. The question is, what should be the best practice of integrating such 3D microphone into the ecosystem of surround sound with height, which is primarily based on single capsule transducers. The authors present the results of several recording experiments and offer recommendations for best practices in 3D mixing using tetrahedral microphones.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19642
[EB1-3]
Title:
A Mid-Air Gestural Controller for the Pyramix 3D panner
Author:
Diego I Quiroz Orozco (McGill University / CIRMMT)
Abstract:
Input devices that operate in three-dimensions and six-degrees-of-freedom (6DOF) have been evolving over the past several decades, and much research has been done on the subject of capturing the gestures of performers in an effort to re-map them to digital instruments, but not much for 3D audio. The practice of 3D audio mixing and spatialization is no different from a musical performance in this respect. The gestures used by engineers are expressive and have complex metaphorical significance. The purpose of this project includes developing a 6DOF gestural control utilising the Leap Motion® controller for the Digital Workstation Pyramix®, which will work as a MIDI layover gestural control system to be used as a multi-dimensional controller and automation recorder.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19643
[EB1-4]
Title:
Recording and mixing technics for ambisonics
Author:
Tomoyuki Nishiyama (NHK)
Abstract:
NHK has been working on producing of immersive audio contents in 22.2 multichannel and 8K UHD-TV. And also, we have tried to make a few VR contents in ambisonics format which were directly related to broadcasted programs. In this paper, sound-making technics in VR contents, including microphone arrangements, sound design and mixing are explained. We also considered what has strongly affected the quality of immersiveness of audio in the process of the production.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19644
[EB1-5]
Title:
Analysis of Sound Localization on Median Plane by using Specular Reflection Shape Features
Authors:
Tatsuya Shibata (Tokyo Denki University), Yuko Watanabe (Tokyo Denki University), and Akito Michishita (Route Theta)
Abstract:
We hypothesize that the pinna shape features should depend on sound directions because the ability is different in sound directions in a median plane. So we use the specular reflection shape method for 3D shape models with light directions. The method is applied by sound directions replacing light directions and the vision point replacing the hearing point to apply to pinna shape features. We analyze the relationship between sound localization ability and the specular reflection shape feature descriptors. We find that the specular reflection shape method is related to sound localization ability in a median plane and the earlobe transformation and the head movements modify the sound localization ability.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19645
[EB1-6]
Title:
Movie Scene Classification using Audio Signals in a Home-Theater System
Authors:
Yuta Yuyama (Yamaha Corporation), Sungyoung Kim (Rochester Institute of Technology), and Hiraku Okumura (Yamaha Corporation)
Abstract:
Research on the realization of high realism by various methods has been done. We are proposing Yamaha's CINEMA DSP because of the high reality feeling in the home theater. There are various scenes of movie contents. In order to obtain a high realistic feeling, it is necessary to perform optimum processing for the scene. Therefore, we propose an automatic scene discrimination method for cinema content to optimize postprocessing and obtain high realism. As a result, optimum processing can be realized for each scene, and higher realism can be obtained.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19646
[EB1-7]
Title:
Accurate spatialization of VR sound sources in the near field
Authors:
Camilo Arévalo (Pontificia Universidad Javeriana Cali), Gerardo M. Sarria M. (Pontificia Universidad Javeriana Cali), and Julián Villegas (University of Aizu)
Abstract:
Spatialization of sound sources is an important but often overlooked aspect for virtual environments. Video game engines have sound spatializers that are efficient and accurate for sound sources in the far field, but spatialization of sources in the near field are subpar. The aim of this research is to offer a spatialization alternative in VR engines such as Unity. We created HRIR_U, an API for the spatialization of multiple sources in the near field in real-time. In this paper, we detail the development of HRIR_U and present a comparison between perceived distance accuracy achieved with native methods in Unity and that achieved with HRIR_U. This comparison indicates that HRIR_U could offer better spatialization, though scalability and performance should be improved.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19647
[EB1-8]
Title:
Creating a highly-realistic “Acoustic Vessel Odyssey” using Sound Field Synthesis with 576 Loudspeakers
Authors:
Yuki Mitsufuji (Sony Corporation), Asako Tomura (Sony Corporation), and Kazunobu Ohkuri (Sony Corporation)
Abstract:
“Acoustic Vessel Odyssey” is a sound installation realizing the future of music by using Sony's spatial audio technology called Sound Field Synthesis (hereafter SFS). It enables creators to simulate popping, moving and partitioning sounds in one space. At the event where we demonstrated “Acoustic Vessel Odyssey”, the immersive experience provided by SFS technology was further enhanced by a newly, specially designed, loudspeaker array consisting of 576 loudspeakers. The content was choreographed by sound artist Evala, accompanied by a light installation created by digital media artists Kimchi and Chips. In this paper, we report the details of the system architecture as well as technical requirements of “Acoustic Vessel Odyssey”.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19648
[EB1-9]
Title:
Loudspeaker Positions with Sufficient Natural Channel Separation for Binaural Reproduction
Authors:
Kat Young (University of York), Gavin Kearney (University of York), and Anthony I. Tew (University of York)
Abstract:
Natural channel separation (NCS) refers to the level of acoustic isolation which exists naturally between the ears for a single sound source. To the authors' knowledge, no systematic study has been undertaken to identify source positions which can produce the required level of NCS for binaural reproduction to be achieved without using crosstalk cancellation. The transfer functions of 655,214 loudspeaker positions were simulated using the boundary element method and the NCS calculated for each. For loudspeaker positions under 0.5m from the head there is a clear inverse relationship between NCS and distance. Close to the head, many positions exceed the 20dB NCS required. Results suggest that near-field binaural reproduction may be implemented without crosstalk cancellation, subject to further perceptual testing.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19649
[EB1-10]
Title:
Quality discrimination on high-resolution audio with difference of quantization accuracy by sound-image localization
Author:
Akio Suguro (Hachinohe Institute of Technology) and Masanobu Miura (Hachinohe Institute of Technology)
Abstract:
High-resolution audio has finer resolution for time and magnitude than conventional CD audio. However, it is unclear whether people perceive the difference of sound quality of high-resolution with conventional CD audio. A study clarified the possibility to perceive the difference by only musicians. Therefore, this paper employed non-musicians as subjects. Listening experiment on this paper employs the sound image localization with headphones due to the difference of quantization accuracy between hi-resolution and conventional CD audios. Results confirms that nine out of ten people can discriminate them significantly by chi-squared test. In other words, average of correct answer rate of all subjects is 46.88% over the chance level 33%, being confirmed that non-musicians are able to identify difference of quantization.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19650
[EB1-11]
Title:
Three-dimensional large-scale loudspeaker array system using high-speed 1-bit signal for immersive sound field reproduction
Authors:
Kakeru Kurokawa (Tokyo Denki University), Izumi Tsunokuni (Tokyo Denki University), Yusuke Ikeda (Tokyo Denki University), Naotoshi Osaka (Tokyo Denki University), and Yasuhiro Oikawa (Waseda University)
Abstract:
Since the number of loudspeakers increases to physically control high-precision or wide-area sound field, the reproduction system becomes more complicated and challenging to construct. We previously developed a 256-channel planar loudspeaker array system which can be implemented with simple circuits by using high-speed 1-bit signals. In this study, we have extended the previous system and proposed a 320ch 3-dimensional loudspeaker array system. Introducing ARM/FPGA SoC has made it possible to control a large number of audio signals via the Linux OS and thus operate the system through network. To investigate a reproducible sound field using the proposed system, a simulation of sound field reproduction was conducted by Wave Field Synthesis and Near-Field Compensated Higher Order Ambisonics.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19651
[EB1-12]
Title:
Sound field reproduction in prism-type arrangement of loudspeaker array by using local sound field synthesis
Authors:
Izumi Tsunokuni (Tokyo Denki University), Kakeru Kurokawa (Tokyo Denki University), Yusuke Ikeda (Tokyo Denki University), and Naotoshi Osaka (Tokyo Denki University)
Abstract:
Many methods and systems have been researched to physically reproduce three-dimensional sound fields. To reproduce the sound field up to a higher frequency, Local Sound Field Synthesis has been proposed with focused sources which virtually enclose a listener's head. On the other hand, the shape and discretization of a loudspeaker array have a large influence on the accuracy of reproduction and reproducible frequency range. In this paper, the influence of spacing and shape of a loudspeaker array on the reproduction accuracy in each frequency is clarified by simulation. To facilitate the implementation of loudspeaker array, we conducted a simulation with a prism-type and cylindrical loudspeaker array excluding the ceiling and the floor surfaces.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19652
[EB1-13]
Title:
Multipresence and Autofocus for Interpreted Narrowcasting
Author:
Michael Cohen (University of Aizu) and Kojima Hiromasa (University of Aizu)
Abstract:
The emergence of ubicomp, IoT, and XR (AR, MR, & VR) paradigms, including mobile-ambient interfaces integrating personal control and situated resources, offers new interactivity options. These FMC (fixed-mobile convergence) ideas are reified in a proof-of-concept featuring narrowcasting integrating mobile and crowd-sensed position: interpreted spatial soundscapescompiling narrowcasting with multipresence.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19653

P9: Signal/Spatial Processing [Day 3 (Aug 9) 9:00–10:15]

Chair: Christoph Sladeczek

[P9-1]
Title:
Structured sparsity for sound field reproduction with overlapping groups: Investigation of the latent group lasso
Authors:
Philippe-Aubert Gauthier (Université de Sherbrooke), Pierre Grandjean (Université de Sherbrooke), and Alain Berry (Université de Sherbrooke)
Abstract:
Sound field reproduction (SFR) applied to microphone array recordings can be defined as an inverse problem. A typical issue is the resulting spatial source signal distribution. A least-square approach with solution 2-norm regularization will activate all sources. To circumvent such issues, sparsity-based regularization offers promising advantages. However, this can result in too sparse solutions. A promising avenue is based on structured-sparsity for predefined groups of reproduction sources. To this end, the group lasso was recently investigated without overlapping group. Its inability to deal with overlapping groups is a limitation. This paper presents the latent group lasso (LGL) where group-level sparsity is induced for overlapping groups. Theoretical results of LGL are successfully compared to other SFR results for sparsity-inducing norms.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19627
[P9-2]
Title:
VISR — A versatile open software framework for audio signal processing
Author:
Andreas Franck (ISVR University of Southampton) and Filippo Fazi (ISVR University of Southampton)
Abstract:
Software plays an increasingly important role in spatial and object-based audio. Realtime and interactive rendering is often needed to subjectively evaluate and demonstrate algorithms, requiring significant implementation effort and often impeding the reproducibility of scientific research. In this paper we present the VISR (Versatile Interactive Scene Renderer) — a modular, open-source software framework for audio processing. VISR enables systematic reuse of DSP functionality, rapid prototyping in C++ or Python, and integration into the typical workflow of audio research and development from initial implementation and offline objective evaluation to subjective testing. This paper provides a practical, example-based introduction to the VISR framework. This is demonstrated with an interconnected example, from algorithm design and implementation, dynamic binaural auralization, to a subjective test.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19628
[P9-3]
Title:
Comparison of microphone distances for real-time reverberation enhancement system using optimal source distribution technique
Authors:
Atsushi Marui (Tokyo University of the Arts), Motoki Yairi (Kajima Technical Research Institute), and Toru Kamekawa (Tokyo University of the Arts)
Abstract:
Although musical instrument players may want to learn the spatial impressions of the concert hall beforehand, practicing in the actual hall is not practical due to the geographical and the financial reasons. Hence, it is helpful for players if there is a spatial reproduction system that can recreate the concert hall reverberation in their practice environments. An implementation of crosstalk cancellation method for synthesizing virtual auditory space using loudspeakers called Optimal Source Distribution technique is used to add realistic reverberation of a live venue to a violin performance in real-time. In this research, one of the defining parameters for the timbre of convolved reverberation, microphone placements on and over the instrument, were compared by professional players.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19629

P10: Production-1 [Day 3 (Aug 9) 10:30–12:10]

Chair: Aki Mäkivirta

[P10-1]
Title:
Subjective and objective evaluation of 9ch three-dimensional acoustic music recording techniques
Authors:
Will Howie (McGill University), Denis Martin (McGill University), David H. Benson (McGill University), Jack Kelly (McGill University), and Richard King (McGill University)
Abstract:
A study was undertaken to compare four different 9-channel three-dimensional acoustic music recording techniques, all optimized for capturing a solo piano. The four techniques range in design philosophy: spaced, near- coincident, and coincident. Results of a subjective listening test showed the two spaced techniques as being equally highly rated for the subjective attributes “naturalness of sound scene”, “naturalness of timbre”, and “sound source image size”. Listeners rated the coincident technique significantly lower than all other techniques under investigation for all perceptual attributes. Binaural recordings of the stimuli were analyzed using several different objective measures, some of which were found to be good predictors for the perceptual attributes “envelopment” and “sound source image size”.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19630
[P10-2]
Title:
An Approach for Object-Based Spatial Audio Mastering
Authors:
Simon Hestermann (Fraunhofer), Mario Seideneck (Fraunhofer), and Christoph Sladeczek (Fraunhofer)
Abstract:
Object-based spatial audio is an approach for interactive three-dimensional audio reproduction. This concept not only changes the way how content creators interact with the audio, but also changes how it is saved and transmitted. Hence, a new process in the reproduction chain called rendering has to be established. The rendering process generates loudspeaker signals out of an object-based scene. Although recording and mixing have been subject to research in the last years, concepts for object-based mastering are nearly missing. The main difference compared to channel-based mastering is that instead of adjusting loudspeaker channels, audio objects need to be altered. This requires a fundamentally new concept for mastering. In the paper, a new method for mastering of object-based audio is presented.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19631
[P10-3]
Title:
Efficacy of using spatial averaging in improving the listening area of immersive audio monitoring in programme production
Author:
Aki Mäkivirta (Genelec) and Thomas Lund (Genelec)
Abstract:
Loudness-based production in broadcast relies on accurate monitoring for judging balance, formats, speech intelligibility, and audio quality. Reproduction systems for monitoring immersive audio presentations share certain assumptions about the monitoring system setup. These include flat frequency response at the listening location, normalized monitoring level, and the same level and time of flight from all monitors to the listening location. The boundary loading in the monitoring rooms tends to modify the frequency responses of the different loudspeakers in the immersive monitoring system, uniquely for the specific location of each loudspeaker. When using equalization to remove response differences between loudspeakers, no significant differences were found between single point measurements and spatial averages for small spatial averaging displacements.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19632
[P10-4]
Title:
Eclectic Space — Dynamic Processing, Temporal Envelope and Sampling as Spatial Stagers in Contemporary Pop Music
Author:
Anders Reuter (University of Copenhagen)
Abstract:
Pop is now almost solely composed in the DAW. This has meant that tools that aren't meant be spatial stagers are becoming exactly this - through expansion of use and not least an intersecting cross-pollination of techniques. This paper seeks to explore how dynamic processing (compression, gating), temporal envelope and sampling are used as spatial manipulators in contemporary pop. The result of this development is often unnatural or even surreal spatial experiences. The paper suggests the term eclectic space for this new kind of spatial staging, and it will use the song 'XXXO' from the album Maya by M.I.A (2010) as a case study.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19633

P11: Production-2 Audio System-2 [Day 3 (Aug 9) 14:45–16:25]

Chair: Ephraim Hahn

[P11-1]
Title:
Sound-Space Choreography
Author:
Gerard Erruz (Eurecat) and Timothy Schmele (Eurecat)
Abstract:
As part of the European project BINCI, a new software tool to control sound sources in periphonic space called Choreographer is presented. The Choreographer permits to define sets of three-dimensional movements. Its integration inside the Binaural Home Studio (BHS) 3D audio production suite allows users to trigger and combine these movements in real-time performances. Parallel to the development process, the aesthetic possibilities of sound-movement have been investigated. We discuss the relationships between sets of musical elements, notation and composition and we review the ways modern dance has explored space and movement. The results of this investigation have been used to discuss design proposals and musicologically informed features for the Choreographer.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19635
[P11-2]
Title:
Should sound and image be coherent during live performances?
Authors:
Etienne Hendrickx (University of Brest), Julian Palacino (Feichter Electronics), Vincent Koehl (University of Brest), Frédéric Changenet (Radio France), Etienne Corteel (L-Acoustics), and Mathieu Paquier (University of Brest)
Abstract:
During amplified live performances, the position of the sound sources rarely coincides with the position of their related visual sources. Yet, several studies have suggested that audiovisual spatial coherence could improve the experience of the audience. In the present experiment, sound engineers had to produce two different mixes for several sequences extracted from concert videos: one freestyle mix where they could spatialize the sound sources at their will, and one spatially-constrained mix, where the sound sources had to be reproduced at the same azimuths as the original positions of the related visual sources. A subjective evaluation revealed that the spatially-constrained mixes were preferred when the image was projected, although the freestyle mixes were preferred when the image was not projected.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19636
[P11-3]
Title:
A framework for intelligent metadata adaptation in object-based audio
Authors:
James Woodcock (University of Salford), Jon Francombe (BBC Research and Development), Andreas Franck (University of Southampton), Philip Coleman (University of Surrey), Richard Hughes (University of Salford), Hansung Kim (University of Surrey), Qingju Liu (University of Surrey), Dylan Menzies (University of Southampton), Marcos Simon Galvez (University of Southampton), Yan Tang (University of Salford), Tim Brookes (University of Surrey), William J. Davies (University of Salford), Bruno M. Fazenda (University of Salford), Russell Mason (University of Surrey), Trevor J. Cox (University of Salford), Filippo Maria Fazi (University of Southampton), Philip J. B. Jackson (University of Surrey), and Chris Pike (BBC Research and Development)
Abstract:
Object-based audio can be used to customize, personalize, and optimize audio reproduction depending on the specific listening scenario. To investigate and exploit the benefits of object-based audio, a framework for intelligent metadata adaptation was developed. The framework uses detailed semantic metadata that describes the audio objects, the loudspeakers, and the room. It features an extensible software tool for real-time metadata adaptation that can incorporate knowledge derived from perceptual tests and/or feedback from perceptual meters to drive adaptation and facilitate optimal rendering. One use case for the system is demonstrated through a rule-set (derived from perceptual tests with experienced mix engineers) for automatic adaptation of object levels and positions when rendering 3D content to two- and five-channel systems.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19637
[P11-4]
Title:
Dual frequency band amplitude panning for multichannel audio systems
Authors:
Richard Hughes (University of Salford), Andreas Franck (University of Southampton), Trevor Cox (University of Salford), Ben Shirley (University of Salford), and Filippo Maria Fazi (University of Southampton)
Abstract:
Panning laws for multi-loudspeaker setups, for example vector base amplitude panning, are typically derived based on either low or high frequency assumptions. It is well known, however, that auditory cues for both localization and loudness differ at these frequencies. This paper investigates the use of dual-band panning, whereby low and high frequency gains and normalization are applied separately. Single- and dual-band rendering algorithms are described, with predicted theoretical localization, spectral balance, and source extent cues given. A multiple stimulus test is described, performed for a number of trajectories rendered to a periphonic nine-loudspeaker ITU-R BS.2051-1 setup, with rated attributes based on predicted sources of error. Results are reported, detailing differences between both theoretical predictions and systems under test.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19638

P12: Audio Systems [Day 3 (Aug 9) 16:40–17:30]

Chair: Hiraku Okumura

[P12-1]
Title:
Sign-Agnostic Matrix Design for Spatial Artificial Reverberation with Feedback Delay Networks
Authors:
Sebastian Schlecht (International Audio Laboratories Erlangen) and Emanuel Habets (International Audio Laboratories Erlangen)
Abstract:
Feedback delay networks (FDNs) are an efficient tool for creating artificial reverberation. Recently, various designs for spatially extending the FDN were proposed. A central topic in the design of spatial FDNs is the choice of the feedback matrix that governs the interaction between spatially distributed elements and therefore the spatial impression. In the design prototype, the feedback matrix is chosen to be unilossless such that the reverberation time is infinite. However, in physics- and aesthetics-driven design of spatial FDNs, the target feedback matrix is not necessarily unilossless. This contribution proposes an optimization method for finding a close unilossless feedback matrix and improves the accuracy by relaxing the specification of the target matrix phase component and focussing on the sign-agnostic component.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19639
[P12-2]
Title:
Musical Emotions Evoked by 3D Audio
Author:
Ephraim Hahn (McGill University)
Abstract:
Do 3D audio formats enhance the quality or intensity of musical emotions? In this study, listeners were presented with classical musical excerpts in Stereo, 5.1 Surround, and Auro-3D 9.1 playback formats. Participants were asked to report their emotional states on the Geneva Emotional Music Scale (GEMS) while listening to two contrasting excerpts of Arnold Schönberg's string sextet “Verklärte Nacht” op. 5. The results provide evidence that 3D audio can invoke a stronger overall emotional arousal in listeners.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19640

Poster-3 (eB) [Day 3 (Aug 9) 13:15–14:30]

[EB2-1]
Title:
Individual binaural reproduction of music recordings using a virtual artificial head
Authors:
Mina Fallahi (Jade Hochschule Oldenburg), Matthias Blau (Jade Hochschule Oldenburg), Martin Hansen (Jade Hochschule Oldenburg), Simon Doclo (Carl von Ossietzky Universität Oldenburg), Steven van de Par (Carl von Ossietzky Universität Oldenburg), and Dirk Püschel (Akustik Technologie Göttingen)
Abstract:
As an alternative to traditional artificial heads, a virtual artificial head (VAH) comprising a microphone array-based beamformer can be used to preserve the spatial properties of the sound field in binaural recordings. The advantage of a VAH is the possibility to adapt the same recording post hoc to individual Head Related Transfer Functions (HRTFs) and to use head-tracking in the reproduction. Here, a narrow-band least-squares cost function is minimized to calculate the filter coefficients for the VAH with constraints on the resulting spectral accuracy and array robustness. Different versions of these constraints are applied to two simulated microphone arrays and their effect on the resulting binaural reproduction is discussed based on objective results and perceptual experiments with music content.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19654
[EB2-2]
Title:
Creation of immersive-sound content by utilizing sound intensity measurement. Part 1, Generating a 4-pi reverberation
Authors:
Masataka Nakahara (ONFUTURE Ltd. / SONA Corporation), Akira Omoto (Kyushu University / ONFUTURE Ltd.), and Yasuhiko Nagatomo (Evixar Inc.)
Abstract:
Capturing and reproducing 4-pi spatial sound is an important factor for creating immersive sound content. However, less tools are provided currently. Therefore, the authors propose an effective production scheme which includes practical solutions for capturing, restoring and editing a 4-pi reverberant field by a simple measurement of sound intensities. In this Part 1, the method to capture and regenerate 4-pi reverberation, named “VSVerb,” is introduced. Because the reverberation is generated by using acoustic information of virtual sound sources which are detected from measured on-site sound intensities, the VSVerb provides high S/N performance and enables to adjust various acoustic parameters with no additional measurements. Flexible functions of the VSVerb contribute to post-production works efficiently, which is introduced in the Part 3.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19655
[EB2-3]
Title:
Creation of immersive-sound content by utilizing sound intensity measurement. Part 2, Obtaining a 3D panning table
Authors:
Takashi Mikami (SONA Corporation), Masataka Nakahara (ONFUTURE Ltd.), and Kazutaka Someya (beBlue Co., Ltd.)
Abstract:
A relationship between panning positions and positions of generated phantom sound images is one of important information for creating immersive sound content. In order to clarify it, the authors conduct sound intensity measurement at a Dolby Atmos compliant mixing room, then 1241 positions of sound objects which are generated in the studio are detected. By analyzing detected objects' positions and their corresponding positions of an Atmos panner, “3D panning map” of the studio is obtained. The map provides a physically proper panning position, X, Y, Z values of the panner, to a desired position in which a mixing engineer intends to place a sound object. The proposed tool “3D panning map” contributes production work of creating immersive sound content efficiently.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19656
[EB2-4]
Title:
Creation of immersive-sound content by utilizing sound intensity measurement. Part 3, Object-based post-production work
Authors:
Ritsuko Tsuchikura (SONA Corporation), Akiho Matsuo (SONA Corporation), and Masumi Takino (beBlue Co., Ltd.)
Abstract:
A new scheme for creating immersive sound content is proposed. The scheme utilizes acoustic information of sound intensities which are measured in real spaces. For creating 3D sound atmosphere, newly developed “VSVerb” and “3D panning table,” described in the Part 1 and 2, are used. The VSVerb restores 4-pi reverberation of a target space, and the 3D panning table provides proper panning positions in which sound objects are intended to be placed. By using four different VSVerbs and the 3D panning table of a Dolby Atmos compliant mixing room, some immersive music contents were created by using a Dolby Atmos production tool. The production flow is described here. Demonstrations of created contents will be made at the time of presentation.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19657
[EB2-5]
Title:
An immersive 3D audio-visual installation based on sound field rendering and reproduction with higher-order ambisonics
Author:
Shoken Kaneko (Yamaha Corporation) and Hiraku Okumura (Yamaha Corporation)
Abstract:
Immersive audio and sound field rendering / reproduction technology, such as higher-order ambisonics (HOA), is receiving much attention recently especially in the context of Virtual reality (VR) technologies, like 360 degrees movies and VR video games. In this work, we present technical details of an audio-visual installation based on sound field rendering and reproduction with HOA, which was installed in Yamaha's headquarter in Hamamatsu, Japan. The 3D audio content creation was done using a 3D audio enabled Digital Audio Workstation (DAW). We review the details of the system and the content creation workflow, and discuss possible directions of future immersive 3D / VR audio content creation tools.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19658
[EB2-6]
Title:
Reduction of Number of Inverse Filters for Boundary Surface Control
Author:
Hiroshi Kashiwazaki (Kyushu University) and Akira Omoto (Kyushu University)
Abstract:
Boundary surface control (BoSC) is a useful method to reproduce sound field physically. However, playback requires multi-channel inverse filtering and it is difficult to real-time processing with a simple computer like a laptop. Therefore, the authors proposed an “rough” reproduction system by a method for thinning out inverse filters of BoSC. This engineering report discusses which inverse filter should be thinned out. As a result of numerical simulation, the accuracy of the sound field reproduction decreased relatively slightly when using a directional microphone array and deleting the filter passing through the loudspeaker far from the microphone.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19659
[EB2-7]
Title:
A novel method for simulating movement in multichannel reverberation providing enhanced listener envelopment
Author:
Joshua Jaworski (The University of Sydney) and William Martens (The University of Sydney)
Abstract:
A novel reverberation simulation algorithm was designed and developed to simulate movement and enhance listener envelopment in spatial soundfields created via multi-channel reproduction systems. Whereas previous methods have most often employed modulated delay lines, or some other means for continuous modulation of the spatio-temporal distribution of reverberation components, the algorithm on which this paper reports utilizes time-segmentation processes commonly characterized as granulation. Therefore, utilizing the process herein termed 'Dynamic Overlap-and-Add' (DOLA), spatial movement within a reverberant soundfield was achieved without the inherent Doppler shifting typically associated with continuous delay modulation. The resulting spatial imagery was evaluated by expert listeners in order to identify the most aesthetically pleasing outputs of the novel reverberation algorithm.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19660
[EB2-8]
Title:
Investigation of practical compensation method for “so-called” bone conduction headphones with a focus on spatialization
Author:
Akiko Fujise (Panasonic Corporation)
Abstract:
Hearing devices which vibrate around the tragus have been adopted as “bone conduction (BC) headphones”, with potential for mixed reality in auditory as well as open-ear headphones using air conduction (AC). However, reproduction of sound images with these devices is yet to be established since the multiple sound/vibration transmission paths are still on exploration. Although relevant findings in human-machine interaction and audiology have been reported recently, there needs more work to bridge these across research areas enough to put into technical implementation. Accordingly, a practical compensation method for spatialization is investigated: combined lumped-parameter circuit model of the transmission pathways of BC to compensate head-related transfer functions for AC. This presentation reports on preliminary results and discussions on possible compensation procedure.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19661
[EB2-9]
Title:
Acoustic validation of a BEM-suitable 3D mesh of KEMAR
Authors:
Kat Young (University of York), Gavin Kearney (University of York), and Anthony I. Tew (University of York)
Abstract:
Binaural audio research frequently employs head-related transfer functions (HRTFs). However, acoustic measurement of large numbers of HRTFs can be impractical. An alternative is boundary element method (BEM) simulation using 3D meshes. This paper describes the acoustic validation of a 3D mesh of KEMAR, previously validated against the manufacturer's CAD file. Differences between acoustic and simulated binaural cues were analysed for consistency. After compensating for systematic errors, the mean difference in interaural time difference was 21.9μs. The mean differences in monaural and interaural spectral content were 2.2dB and 2.9dB, respectively. It is concluded that the mesh is suitable for BEM simulations below 20kHz, provided that just-noticeable perceptual differences are acceptable, and should prove useful in a variety of research applications.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19662
[EB2-10]
Title:
Spatial impression of source widening effect for binaural audio production
Authors:
Hengwei Su (Tokyo University of the Arts), Atsushi Marui (Tokyo University of the Arts), and Toru Kamekawa (Tokyo University of the Arts)
Abstract:
Creation of widths of sound objects is necessary for natural and complex spatial reproduction. To investigate the effect on spatial impression when applied to audio production, source widening processing for binaural synthesis was implemented as an VST plugin which could be used in digital audio workstation. The plugin was provided for mixing sound effects for video clips, to control localizations and widths of sound objects for binaural reproduction. Multiple versions of mix were created and a subjective listening experiment was conducted to evaluate the overall spatial impression of the mixes. The result indicated that the source widening effect could improve the spatial impression although the performance might depend on nature of the source or the individual preferences.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19663
[EB2-11]
Title:
A method to measure the binaural response based on the reciprocity by using the dummy head with sound sources
Author:
Wan-Ho Cho (Korea Research Institute of Standards and Science) and Ji-Ho Chang (Korea Research Institute of Standards and Science)
Abstract:
A novel approach to measure a binaural response is proposed based on the reciprocity principle. A modified dummy head system having two sound sources located at the ear positions is designed for this purpose. By applying the dummy head source system, the binaural response can be measured by placing microphones at the field points where originally are the source positions. The proposed method is capable to solve the difficulties of binaural measurement related to the installation of sound source at various positions. Also the system has advantages to measure the near field head-related transfer function and evaluate the contribution of acoustic pathway concerning the auditory response.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19664
[EB2-12]
Title:
Virtual Ensemble System with three-dimensional Sound Field Reproduction using ‘Sound Cask’
Authors:
Yuko Watanabe (Tokyo Denki University), Tomoya Fukuda (Tokyo Denki University), Shin Iwai (Tokyo Denki University), Rina Sasaki (Tokyo Denki University), and Shiro Ise (Tokyo Denki University)
Abstract:
In this paper, we propose the Virtual-Ensemble System with a three-dimensional sound field reproduction system based on the boundary-surface control (BoSC) principle. The BoSC system consists of an 80-ch fullerene-shaped microphone array and an immersive sound reproduction system mounted with 96-ch loudspeakers, referred to as a “Sound Cask”, and it provides high-performance sound localization, thus high acoustic reality compared to the conventional sound reproduction system. As one useful application, a sound-field simulator system is proposed, in which a musician can play a musical instrument inside the system to achieve the feeling of a concert-hall-like space. In addition, we are developing a sound simulation system for use with an ensemble by acoustically connecting two sound casks, refers to as ”virtual-ensemble system”.
AES E-Library Permalink:
http://www.aes.org/e-lib/browse.cfm?elib=19665
AES - Audio Engineering Society