AES Paris 2016
Paper Session P21

P21 - Immersive Audio: Part 1


Tuesday, June 7, 08:45 — 11:45 (Room 352B)

Chair:
Bob Schulein, RBS Consultants / ImmersAV Technology - Schaumburg, IL, USA

P21-1 Low-Complexity Stereo Signal Decomposition and Source Separation for Application in Stereo to 3D UpmixingSebastian Kraft, Helmut-Schmidt-University - Hamburg, Germany; Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
In this paper we present a general low-complexity stereo signal decomposition approach. Based on a common stereo signal model, the panning coefficients and azimuth positions of the sources in a stereo mix are estimated. In a next step, this information is used to separate direct and ambient signal components. The simple algorithm can be implemented at low computational cost and its application in a stereo to 3D upmix context is described. Particular focus is put on the generation of additional ambient channels by using decorrelation filters in a tree structure. Finally, the separation performance is evaluated with several standard measures and compared to other algorithms. Also a poster—see session P22-5]
Convention Paper 9586 (Purchase now)

P21-2 Immersive Audio Delivery Using Joint Object CodingHeiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Lars Villemoes, Dolby Sweden - Stockholm, Sweden; Jonas Samuelsson, Dolby Sweden AB - Stockholm, Sweden; Janusz Klejsa, Dolby Sweden AB - Stockholm, Sweden
Immersive audio experiences (3D audio) are an important element of next-generation audio entertainment systems. This paper presents joint object coding techniques that enable the delivery of object-based immersive audio content (e.g., Dolby Atmos) at low bit rates. This is achieved by conveying a multichannel downmix of the immersive content using perceptual audio coding algorithms together with parametric side information that enables the reconstruction of the audio objects from the downmix in the decoder. An advanced joint object coding tool is part of the AC-4 system recently standardized by ETSI. Joint object coding is also used in a backwards compatible extension of the Dolby Digital Plus system. Listening test results illustrate the performance of joint object coding in these two applications. [Also a poster—see session P22-7]
Convention Paper 9587 (Purchase now)

P21-3 Design and Subjective Evaluation of a Perceptually-Optimized Headphone VirtualizerGrant Davidson, Dolby Laboratories, Inc. - San Francisco, CA, USA; Dan Darcy, Dolby Laboratories, Inc. - San Francisco, CA, USA; Louis Fielder, Dolby - San Francisco, CA, USA; Zhiwei Schuang, Dolby Laboratories Intl. Services Co. Ltd. - Beijing, China; Rich Graff, Dolby Laboratories, Inc. - San Francisco, CA, USA; Jeroen Breebaart, Dolby Australia Pty. Ltd. - McMahons Point, NSW, Australia; Poppy Crum, Dolby Laboratories - San Francisco, CA, USA
We describe a novel method for designing echoic headphone virtualizers based on a stochastic room model and a numerical optimization procedure. The method aims to maximize sound source externalization under a natural-timbre constraint. The stochastic room model generates a number of binaural room impulse response (BRIR) candidates for each virtual channel, each embodying essential perceptual cues. A perceptually-based distortion metric evaluates the timbre of each candidate, and the optimal candidate is selected for use in the virtualizer. We designed a 7.1.4 channel virtualizer and evaluated it relative to a LoRo stereo downmix using a single-interval A:B preference test. For a pool of 10 listeners, the test resulted in an overall virtualizer preference of 75%, with no stereo test item preferred over binaural.
Convention Paper 9588 (Purchase now)

P21-4 An Open 3D Audio Production Chain Proposed by the Edison 3D ProjectEtienne Corteel, Sonic Emotion Labs - Paris, France; David Pesce, b<>com - Cesson-Sévigné, France; Raphael Foulon, Sonic Emotion Labs - Paris, France; Gregory Pallone, b<>com - Cesson-Sévigné, France; Frédéric Changenet, Radio France - Paris, France; Hervé Dejardin, Radio France - Paris, France
In this paper we present a production chain for Next Generation Audio formats that combines standard digital audio workstations with external 3D audio rendering software using an open communication protocol. We first describe the scope of the Edison 3D project. In a second part, we revisit existing 3D audio formats outlining the need for an open format for content creation and archiving. We then describe the tools developed in Edison 3D enabling user interaction, storage of object positions in the timeline, monitoring of audio content in various rendering formats (stereo, 5.1, binaural, WFS, HOA), and export into the open and recently standardized (ITU BS.2076) 3D audio format: Audio Definition Model. We finally provide an outlook into future work.
Convention Paper 9589 (Purchase now)

P21-5 Perceptual Evaluation of Transpan for 5.1 Mixing of Acoustic RecordingsGaëtan Juge, Paris Conservatory (CNSMDP) - Paris, France; Amandine Pras, Paris Conservatoire (CNSMDP) - Paris, France; Stetson University - DeLand, FL, USA; Ilja Frissen, McGill University - Montreal, Quebec, Canada
We evaluate the efficiency of a 3D spatialization software named Transpan in the context of mixing acoustic recordings on a 5.1 reproduction system. The study aims to investigate if the use of the binaural with cross-talk cancellation (XTC) processing implemented in Transpan can improve the localization of lateral sources and their stability through listeners’ movements. We administered a listening test to 22 expert listeners in Paris and in Berlin. The test consisted of comparisons among two mixes with and without Transpan binaural/XTC panning, for four classical music excerpts under five listening conditions, i.e., at the sweet spot and while performing specific movements. Quantitative analysis of multiple choice questions showed that Transpan can enlarge the 5.1 sweet spot area toward the rear speakers. From qualitative analysis of participants’ feedback emerged five main categories of comments, namely Localization stability; Precise localization accuracy; Vague localization accuracy; Timbral and spectral artifacts; and Spatial differences. Together the results show that Transpan allows for better source lateralization in 5.1 mixing. Also a poster—see session P22-5]
Convention Paper 9590 (Purchase now)

P21-6 The Influence of Head Tracking Latency on Binaural Rendering in Simple and Complex Sound ScenesPeter Stitt, LIMSI Université Paris-Saclay - Orsay, France; Etienne Hendrickx, Paris Conservatory (CNSMDP) - Paris, France; Jean-Christophe Messonnier, CNSMDP Conservatoire de Paris - Paris, France; Brian Katz, LIMSI-CNRS - Orsay, France
Head tracking has been shown to improve the quality of multiple aspects of binaural rendering for single sound sources, such as reduced front-back confusions. This paper presents the results of an AB experiment to investigate the influence of tracker latency on the perceived stability of virtual sounds. The stimuli used are a single frontal sound source and a complex (5 source) sound scene. A comparison is performed between the results for the simple and complex sound scenes and the head motions of the subjects for various latencies. The perceptibility threshold was found to be 10 ms higher for the complex scene compared to the simple one. The subject head movement speeds were found to be 6 degrees/s faster for the complex scene.
Convention Paper 9591 (Purchase now)


Return to Paper Sessions

EXHIBITION HOURS June 5th   10:00 – 18:00 June 6th   09:00 – 18:00 June 7th   09:00 – 16:00
REGISTRATION DESK June 4th   08:00 – 18:00 June 5th   08:00 – 18:00 June 6th   08:00 – 18:00 June 7th   08:00 – 16:00
TECHNICAL PROGRAM June 4th   09:00 – 18:30 June 5th   08:30 – 18:00 June 6th   08:30 – 18:00 June 7th   08:45 – 16:00
AES - Audio Engineering Society