AES Dublin 2019
Paper Session P09
P09 - Machine Learning: Part 1
Thursday, March 21, 14:00 — 15:30 (Meeting Room 2)
Chair:
Konstantinos Drossos, Tampere University of Technology - Tampere, Finland
P09-1 Feature Selection and its Evaluation in Binaural Ear Acoustic Authentication—Masaki Yasuhara, NIT, Nagaoka College - Nagaoka City, Niigata, Japan; Shohei Yano, Nagaoka College - Nagaoka City, Niigata, Japan; Takayuki Arakawa, NEC Corporation - Tokyo, Japan; Takafumi Koshinaka, NEC Corporation - Japan
Ear acoustic authentication is a type of biometric authentication that uses the ear canal transfer characteristics that show the acoustic characteristics of the ear canal. In ear acoustic authentication, biological information can be acquired from both ears. However, extant literature on an accuracy improvement method using binaural features is inadequate. In this study we experimentally determine a feature that represents the difference between each user to perform a highly accurate authentication. Feature selection was performed by changing the combination of binaural features, and they were evaluated using the ratio of between-class and within-class variance and equal error ratio (EER). We concluded that a method that concatenates the features of both ears has the highest performance.
Convention Paper 10160 (Purchase now)
P09-2 Deep Learning for Synthesis of Head-Related Transfer Functions—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA
Ipsilateral and contralateral head-related transfer functions (HRTF) are used for creating the perception of a virtual sound source at a virtual location. Publicly available databases use a subset of a full-grid of angular directions due to time and complexity to acquire and deconvolve responses. In this paper we compare and contrast subspace-based techniques for reconstructing HRTFs at arbitrary directions for a sparse dataset (e.g., IRCAM-Listen HRTF database) using (i) hybrid-based (combined linear and nonlinear) principal component analysis (PCA)+fully-connected neural network (FCNN), and (ii) a fully nonlinear (viz., deep learning based) Autoencoder (AE) approach. The results from the AE-based approach show improvement over the hybrid approach, in both objective and subjective tests, and we validate the AE-based approach on the MIT dataset.
Convention Paper 10161 (Purchase now)
P09-3 Bayesian Optimization of Deep Learning Techniques for Synthesis of Head-Related Transfer Functions—Sunil G. Bharitkar, HP Labs., Inc. - San Francisco, CA, USA; Timothy Mauer, HP, Inc. - San Francisco, CA, USA; Teresa Wells, HP, Inc. - San Francisco, CA, USA; David Berfanger, HP, Inc. - Vancouver, WA, USA
Head-related transfer functions (HRTF) are used for creating the perception of a virtual sound source at horizontal angle ø and vertical angle ?. Publicly available databases use a subset of a full-grid of angular directions due to time and complexity to acquire and deconvolve responses. In this paper we build up on our prior research [5] by extending the technique to HRTF synthesis, using the IRCAM dataset, while reducing the computational complexity of the autoencoder (AE)+fully-connected-neural-network (FCNN) architecture by ˜ 60% using Bayesian optimization. We also present listening test results, demonstrating the performance of the presented approach, from a pilot study that was designed for assessing the directional cues of the proposed architecture.
Convention Paper 10162 (Purchase now)