AES San Francisco 2012
Paper Session P8
P8 - Emerging Audio Technologies
Saturday, October 27, 9:00 am — 12:30 pm (Room 121)
Chair:
Agnieszka Roginska, New York University - New York, NY, USA
P8-1 A Method for Enhancement of Background Sounds in Forensic Audio Recordings—Robert C. Maher, Montana State University - Bozeman, MT, USA
A method for suppressing speech while retaining background sound is presented in this paper. The procedure is useful for audio forensics investigations in which a strong foreground sound source or conversation obscures subtle background sounds or utterances that may be important to the investigation. The procedure uses a sinusoidal speech model to represent the strong foreground signal and then performs a synchronous subtraction to isolate the background sounds that are not well-modeled as part of the speech signal, thereby enhancing the audibility of the background material.
Convention Paper 8731 (Purchase now)
P8-2 Transient Room Acoustics Using a 2.5 Dimensional Approach—Patrick Macey, Pacsys Ltd. - Nottingham, UK
Cavity modes of a finite acoustic domain with rigid boundaries can be used to compute the transient response for a point source excitation. Previous work, considering steady state analysis, showed that for a room of constant height the 3-D modes can be computed very rapidly by computing the 2-D cross section modes. An alternative to a transient modal approach is suggested, using a trigonometric expansion of the pressure through the height. Both methods are much faster than 3-D FEM but the trigonometric series approach is more easily able to include realistic damping. The accuracy of approximating an “almost constant height” room to be constant height is investigated by example.
Convention Paper 8732 (Purchase now)
P8-3 Multimodal Information Management: Evaluation of Auditory and Haptic Cues for NextGen Communication Displays—Durand Begault, Human Systems Integration Division, NASA Ames Research Center - Moffett Field, CA, USA; Rachel M. Bittner, New York University - New York, NY, USA; Mark R. Anderson, Dell Systems, NASA Ames Research Center - Moffett Field, CA, USA
Auditory communication displays within the NextGen data link system may use multiple synthetic speech messages replacing traditional air traffic control and company communications. The design of an interface for selecting among multiple incoming messages can impact both performance (time to select, audit, and release a message) and preference. Two design factors were evaluated: physical pressure-sensitive switches versus flat panel “virtual switches,” and the presence or absence of auditory feedback from switch contact. Performance with stimuli using physical switches was 1.2 s faster than virtual switches (2.0 s vs. 3.2 s); auditory feedback provided a 0.54 s performance advantage (2.33 s vs. 2.87 s). There was no interaction between these variables. Preference data were highly correlated with performance.
Convention Paper 8733 (Purchase now)
P8-4 Prototype Spatial Auditory Display for Remote Planetary Exploration—Elizabeth M. Wenzel, NASA-Ames Research Center - Moffett Field, CA, USA; Martine Godfroy, NASA Ames Research Center - Moffett Field, CA, USA; San Jose State University Foundation; Joel D. Miller, Dell Systems, NASA Ames Research Center - Moffett Field, CA, USA
During Extra-Vehicular Activities (EVA), astronauts must maintain situational awareness (SA) of a number of spatially distributed "entities" such as other team members (human and robotic), rovers, and a lander/habitat or other safe havens. These entities are often outside the immediate field of view and visual resources are needed for other task demands. Recent work at NASA Ames has focused on experimental evaluation of a spatial audio augmented-reality display for tele-robotic planetary exploration on Mars. Studies compared response time and accuracy performance with different types of displays for aiding orientation during exploration: a spatial auditory orientation aid, a 2-D visual orientation aid, and a combined auditory-visual orientation aid under a number of degraded vs. nondegraded visual conditions. The data support the hypothesis that the presence of spatial auditory cueing enhances performance compared to a 2-D visual aid, particularly under degraded visual conditions.
Convention Paper 8734 (Purchase now)
P8-5 The Influence of 2-D and 3-D Video Playback on the Perceived Quality of Spatial Audio Rendering for Headphones—Amir Iljazovic, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Florian Leschka, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Bernhard Neugebauer, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jan Plogsties, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Algorithms for processing of spatial audio are becoming more attractive for practical applications as multichannel formats and processing power on playback devices enable more advanced rendering techniques. In this study the influence of the visual context on the perceived audio quality is investigated. Three groups of 15 listeners are presented to audio-only, audio with 2-D video, and audio with 3-D video content. The 5.1 channel audio material is processed for headphones using different commercial spatial rendering techniques. Results indicate that a preference for spatial audio processing over a downmix to conventional stereo can be shown with the effect being larger in the presence of 3-D video content. Also, the influence of video on perceived audio quality is significant for 2-D and 3-D video presentation.
Convention Paper 8735 (Purchase now)
P8-6 An Autonomous System for Multitrack Stereo Pan Positioning—Stuart Mansbridge, Queen Mary University of London - London, UK; Saorise Finn, Queen Mary University of London - London, UK; Birmingham City University - Birmingham, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
A real-time system for automating stereo panning positions for a multitrack mix is presented. Real-time feature extraction of loudness and frequency content, constrained rules, and cross-adaptive processing are used to emulate the decisions of a sound engineer, and pan positions are updated continuously to provide spectral and spatial balance with changes in the active tracks. As such, the system is designed to be highly versatile and suitable for a wide number of applications, including both live sound and post-production. A real-time, multitrack C++ VST plug-in version has been developed. A detailed evaluation of the system is given, where formal listening tests compare the system against professional and amateur mixes from a variety of genres.
Convention Paper 8736 (Purchase now)
P8-7 DReaM: A Novel System for Joint Source Separation and Multitrack Coding—Sylvain Marchand, University of Western Brittany - Brest, France; Roland Badeau, Telecom ParisTech - Paris, France; Cléo Baras, GIPSA-Lab - Grenoble, France; Laurent Daudet, University Paris Diderot - Paris, France; Dominique Fourer, University Bordeaux - Talence, France; Laurent Girin, GIPSA-Lab - Grenoble, France; Stanislaw Gorlow, University of Bordeaux - Talence, France; Antoine Liutkus, Telecom ParisTech - Paris, France; Jonathan Pinel, GIPSA-Lab - Grenoble, France; Gaël Richard, Telecom ParisTech - Paris, France; Nicolas Sturmel, GIPSA-Lab - Grenoble, France; Shuhua Zang, GIPSA-Lab - Grenoble, France
Active listening consists in interacting with the music playing, has numerous applications from pedagogy to gaming, and involves advanced remixing processes such as generalized karaoke or respatialization. To get this new freedom, one might use the individual tracks that compose the mix. While multitrack formats lose backward compatibility with popular stereo formats and increase the file size, classic source separation from the stereo mix is not of sufficient quality. We propose a coder / decoder scheme for informed source separation. The coder determines the information necessary to recover the tracks and embeds it inaudibly in the mix, which is stereo and has a size comparable to the original. The decoder enhances the source separation with this information, enabling active listening.
Convention Paper 8737 (Purchase now)