AES New York 2013
Paper Session P13
Saturday, October 19, 9:00 am — 11:30 am (Room 1E07)
Paper Session: P13 - Applications in Audio—Part 2
Chair:
Hans Riekehof-Boehmer, SCHOEPS Mikrofone - Karlsruhe, Germany
P13-1 Level-Normalization of Feature Films Using Loudness vs Speech—Esben Skovenborg, TC Electronic - Risskov, Denmark; Thomas Lund, TC Electronic A/S - Risskov, Denmark
We present an empirical study of the differences between level-normalization of feature films using the two dominant methods: loudness normalization and speech (“dialog”) normalization. The sound of 35 recent “blockbuster” DVDs were analyzed using both methods. The difference in normalization level was up to 14 dB, on average 5.5 dB. For all films the loudness method provided the lowest normalization level and hence the greatest headroom. Comparison of automatic speech measurement to manual measurement of dialog anchors shows a typical difference of 4.5 dB, with the automatic measurement producing the highest level. Employing the speech-classifier to process rather than measure the films, a listening test suggested that the automatic measure is positively biased because it sometimes fails to distinguish between “normal speech” and speech combined with “action” sounds. Finally, the DialNorm values encoded in the AC-3 streams on DVDs were compared to both the automatically and the manually measured speech levels and found to match neither one well.
AES 135th Convention Best Peer-Reviewed Paper Award Cowinner
Convention Paper 8983 (Purchase now)
P13-2 Sound Identification from MPEG-Encoded Audio Files—Joseph G. Studniarz, Montana State University - Bozeman, MT, USA; Robert C. Maher, Montana State University - Bozeman, MT, USA
Numerous methods have been proposed for searching and analyzing long-term audio recordings for specific sound sources. It is increasingly common that audio recordings are archived using perceptual compression, such as MPEG-1 Layer 3 (MP3). Rather than performing sound identification upon the reconstructed time waveform after decoding, we operate on the undecoded MP3 audio data as a way to improve processing speed and efficiency. The compressed audio format is only partially processed using the initial bitstream unpacking of a standard decoder, but then the sound identification is performed directly using the frequency spectrum represented by each MP3 data frame. Practical uses are demonstrated for identifying anthropogenic sounds within a natural soundscape recording.
Convention Paper 8984 (Purchase now)
P13-3 Pilot Workload and Speech Analysis: A Preliminary Investigation—Rachel M. Bittner, New York University - New York, NY, USA; Durand R. Begault, Human Systems Integration Division, NASA Ames Research Center - Moffett Field, CA, USA; Bonny R. Christopher, San Jose State University Research Foundation, NASA Ames Research Center - Moffett Field, CA, USA
Prior research has questioned the effectiveness of speech analysis to measure a talker's stress, workload, truthfulness, or emotional state. However, the question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Convention Paper 8985 (Purchase now)
P13-4 Gain Stage Management in Classic Guitar Amplifier Circuits—Bryan Martin, McGill University - Montreal, QC, Canada
The guitar amplifier became a common tool in musical creation during the second half of the 20th Century. This paper attempts to detail some of the internal mechanisms by which the tones are created and their dependent interactions. Two early amplifier designs are examined to determine the circuit relationships and design decisions that came to define the sound of the electric guitar.
Convention Paper 8986 (Purchase now)
P13-5 Audio Pre-Equalization Models for Building Structural Sound Transmission Suppression—Cheng Shu, University of Rochester - Rochester, NY, USA; Fangyu Ke, University of Rochester - Rochester, NY, USA; Xiang Zhou, Bose Corporation - Framingham, MA, USA; Gang Ren, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
We propose a novel audio pre-equalization model that utilizes the transmission characteristics of building structures to reduce the interference reaching adjacent neighbors while maintaining the audio quality for the target listener. The audio transmission profiles are obtained by field acoustical measurements in several typical types of building structures. We also measure the spectrum of audio to adapt the pre-equalization model to a specific audio segment. We apply a computational auditory model to (1) monitor the perceptual audio quality for the target listener and (2) access the interference caused to adjacent neighbors. The system performance is then evaluated using subjective rating experiments.
Convention Paper 8987 (Purchase now)