AES New York 2018
Paper Session P17
P17 - Semantic Audio
Saturday, October 20, 1:30 pm — 3:30 pm (1E11)
Chair:
Rachel Bittner, New York University - New York, NY, USA
P17-1 Audio Forensic Gunshot Analysis and Multilateration—Robert C. Maher, Montana State University - Bozeman, MT, USA; Ethan Hoerr, Montana State University - Bozeman, MT, USA
This paper considers the opportunities and challenges of acoustic multilateration in gunshot forensics cases. Audio forensic investigations involving gunshot sounds may consist of multiple simultaneous but unsynchronized recordings obtained in the vicinity of the shooting incident. The multiple recordings may provide information useful to the forensic investigation, such as the location and orientation of the firearm, and if multiple guns were present, addressing the common question “who shot first?” Sound source localization from multiple recordings typically employs time difference of arrival (TDOA) estimation and related principles known as multilateration. In theory, multilateration can provide a good estimate of the sound source location, but in practice acoustic echoes, refraction, diffraction, reverberation, noise, and spatial/temporal uncertainty can be confounding.
Convention Paper 10100 (Purchase now)
P17-2 Speech Classification for Acoustic Source Localization and Tracking Applications Using Convolutional Neural Networks—Jonathan D. Ziegler, Stuttgart Media University - Stuttgart, Germany; Eberhard Karls University Tübingen - Tübingen, Germany; Andreas Koch, Stuttgart Media University - Stuttgart, Germany; Andreas Schilling, Eberhard Karls University Tuebingen - Tuebingen, Germany
Acoustic Source Localization and Speaker Tracking are continuously gaining importance in fields such as human computer interaction, hands-free operation of smart home devices, and telecommunication. A set-up using a Steered Response Power approach in combination with high-end professional microphone capsules is described and the initial processing stages for detection angle stabilization are outlined. The resulting localization and tracking can be improved in terms of reactivity and angular stability by introducing a Convolutional Neural Network for signal/noise discrimination tuned to speech detection. Training data augmentation and network architecture are discussed; classification accuracy and the resulting performance boost of the entire system are analyzed.
Convention Paper 10101 (Purchase now)
P17-3 Supervised Source Localization Using Spot Microphones—Miguel Ibáñez Calvo, International Audio Laboratories Erlangen - Erlangen, Germany; Maria Luis Valero, International Audio Laboratories Erlangen - Erlangen, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen - Erlangen, Germany
Spatial microphones are used to acquire sound scenes, while spot microphones are commonly used to acquire individual sound sources with high quality. These recordings are essential when producing spatial audio upmixes. However, to automatically create upmixes, or to assist audio engineers in creating these, information about the position of the sources in the scene is also required. We propose a supervised sound source localization method to estimate the direction-of-arrival (DOA) of several simultaneously active sound sources in reverberant and noisy environments that utilizes several spot microphones and a single spatial microphone. The proposed method employs system identification techniques to estimate the relative impulse responses between each spot microphone and the spatial microphone from which the DOAs can be extracted.
Convention Paper 10102 (Purchase now)
P17-4 Multichannel Fusion and Audio-Based Features for Acoustic Event Classification—Daniel Krause, AGH University of Science and Technology - Kraków, Poland; Fitech; Konrad Kowalczyk, AGH University of Science and Technology - Kraków, Poland
Acoustic event classification is of interest for various audio applications. The aim of this paper is to investigate the usage of a number of speech and audio based features in the task of acoustic event classification. Several features that originate from audio signal analysis are compared with features typically used in speech processing such as mel-frequency cepstral coefficients (MFCCs). In addition, the approaches to fuse the information obtained from multichannel recordings of an acoustic event are investigated. Experiments are performed using a Gaussian mixture model (GMM) classifier and audio signals recorded using several scattered microphones.
Convention Paper 10103 (Purchase now)