AES New York 2018
Poster Session P08
P08 - Acoustics and Signal Processing
Thursday, October 18, 10:00 am — 11:30 am (Poster Area)
P08-1 Estimation of MVDR Beamforming Weights Based on Deep Neural Network—Moon Ju Jo, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Geon Woo Lee, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Jung Min Moon, Gwangju Institute of Science and Technology (GIST) - Gwangju. Korea; Choongsang Cho, Artificial Intelligence Research Center, Korea Electronics Technology Institute (KETI) - Sungnam, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper we propose a deep learning-based MVDR beamforming weight estimation method. The MVDR beamforming weight can be estimated based on deep learning using GCC-PHAT without the information on the source location, while the MVDR beamforming weight requires information on the source location. As a result of an experiment with REVERB challenge data, the root mean square error between the estimated weight and the MVDR weight was found to be 0.32.
Convention Paper 10068 (Purchase now)
P08-2 A Statistical Metric for Stability in Instrumental Vibrato—Sarah R. Smith, University of Rochester - Rochester, NY, USA; Mark F. Bocko, University of Rochester - Rochester, NY, USA
When instrumentalists perform with vibrato, they add a quasi-periodic frequency modulation to the note. Although this modulation is rarely purely sinusoidal, many methods for vibrato parameterization focus exclusively on the rate and depth of the frequency modulation, with less attention given to measuring how a performer’s vibrato changes over the course of a note. In this paper we interpret the vibrato trajectories as instantiations of a random process that can be characterized by an associated autocorrelation function and power spectral density. From these distributions, a coherence time can be estimated that describes the stability of the vibrato within a note. This metric can be used to characterize individual performers as well as for resynthesizing vibratos of different styles.
Convention Paper 10069 (Purchase now)
P08-3 Machine Learning Applied to Aspirated and Non-Aspirated Allophone Classification–An Approach Based on Audio “Fingerprinting”—Magdalena Piotrowska, Gdansk University of Technology - Poland; Hear Candy Mastering - Poland; Grazina Korvel, Vilnius University - Vilnius, Lithuania; Adam Kurowski, Gdansk University of Technology - Gdansk, Poland; Bozena Kostek, Gdansk University of Technology - Gdansk, Poland; Audio Acoustics Lab.; Andrzej Czyzewski, Gdansk University of Technology - Gdansk, Poland
The purpose of this study is to involve both Convolutional Neural Networks and a typical learning algorithm in the allophone classification process. A list of words including aspirated and non-aspirated allophones pronounced by native and non-native English speakers is recorded and then edited and analyzed. Allophones extracted from English speakers’ recordings are presented in the form of two-dimensional spectrogram images and used as input to train the Convolutional Neural Networks. Various settings of the spectral representation are analyzed to determine adequate option for the allophone classification. Then, testing is performed on the basis of non-native speakers’ utterances. The same approach is repeated employing learning algorithm but based on feature vectors. The archived classification results are promising as high accuracy is observed.
Convention Paper 10070 (Purchase now)
P08-4 Combining the Signals of Sound Sources Separated from Different Nodes after a Pairing Process Using an STFT-Based GCC-PHAT—César Clares-Crespo, University of Alcalá - Alcalá de Henares, Madrid, Spain; Joaquín García-Gómez, University of Alcalá - Alcalá de Henares, Madrid, Spain; Alfredo Fernández-Toloba, University of Alcalá - Alcalá de Henares, Madrid, Spain; Roberto Gil-Pita, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares, Madrid, Spain; Manuel Utrilla-Manso, University of Alcalá - Alcalá de Henares, Madrid, Spain
This paper presents a Blind Source Separation (BSS) algorithm proposed to work into a multi-node network. It is intended to be used indoors as an improvement to overcome some limitations of classical BSS algorithms. The goal is to improve the quality of the speech sources combining the signals of the same source separated from the different nodes located in the room. The algorithm matches the signals belonging to the same source through cross-correlations. It provides some stability to the quality of the separated sources since it takes advantage of the fact that the sources are likely to be correctly separated in some nodes.
Convention Paper 10071 (Purchase now)
P08-5 Reverberation Modeled as a Random Ensemble of Images—Stephen McGovern, Wire Grind Audio - Sunnyvale, CA, USA
Modeling box-shaped rooms with the image method requires many mathematical operations. The resulting reverberation effect is qualitatively flawed due to sweeping echoes and flutter. Efficiency is improved with the Fast Image Method, and sweeping echoes can be suppressed by using randomization. With both approaches, however, there is still a remaining problem with flutter. To address all of these issues, the Fast Image Method is modified to have both randomized image locations and increased symmetry. Additional optimizations are proposed and applied to the algorithm. The resulting audio effect has improved quality, and the computation time is dramatically reduced (by a factor usually exceeding 200) when compared to its ancestral Allen and Berkley algorithm. Some relevant perceptual considerations are also discussed.
Convention Paper 10072 (Purchase now)
P08-6 On the Accuracy of Audience Implementations in Acoustic Computer Modelling—Ross Hammond, University of Derby - Derby, Derbyshire, UK; Peter Mapp Associates - Colchester, UK; Adam J. Hill, University of Derby - Derby, Derbyshire, UK; Peter Mapp, Peter Mapp Associates - Colchester, Essex, UK
Performance venue acoustics differ significantly due to audience size, largely from the change in absorption and reflection pathways. Creating acoustic models that accurately mimic these changes is problematic, showing significant variance between audience implementation methods and modelling techniques. Changes in total absorption per person due to audience size and density makes absorption coefficients selection difficult. In this research, FDTD simulations confirm that for densely packed audiences, diffraction leads to a linear correlation between capacity and total absorption at low frequencies, while at high frequencies there is less increase in total absorption per person. The significance of diffraction renders ray-tracing inaccurate for individually modelled audience members and has further implications regarding accuracy of standard audience modelling procedures.
Convention Paper 10073 (Purchase now)
P08-7 High Resolution Horizontal Arrays for Sound Stages and Room Acoustics: Concepts and Benefits—Cornelius Bradter, State University of New York Korea - Shinan ICT, Incheon, Korea; Kichel Ju, Shinan Information & Communication Co., Ltd. - Gyeonggi-do, Korea
Wavefield-synthesis systems were the first to apply large horizontal arrays for sound reproduction and live sound. While there has been some success for experimental music or sound art, results for music and live sound are limited. The reasons for this are array sound quality, insufficient sound pressure level, spatial aliasing, and interference and inflexible array control algorithm. Nevertheless, the application of high resolution horizontal arrays with appropriate array control systems produces exceptional sound quality, extensive flexibility to adjust to acoustic conditions, and remarkable control over sound distribution. Advanced systems can overcome many shortcomings in traditional sound systems. This paper describes the concepts and technology of high performance HRH-arrays and their utilization for church sound through examples of application in Korean churches.
Convention Paper 10074 (Purchase now)