AES Paris 2016
Paper Session P18
P18 - Human Factors and Interfaces
Monday, June 6, 14:45 — 15:45 (Room 352B)
Chair:
Sorgun Akkor, STD Ltd. - Istanbul, Turkey
P18-1 Robustness of Speaker Recognition from Noisy Speech Samples and Mismatched Languages—Ahmed Al-Noori, University of Salford - Salford, Greater Manchester, UK; Francis F. Li, University of Salford - Salford, UK; Philip J. Duncan, University of Salford - Salford, Greater Manchester, UK
Speaker recognition systems can typically attain high performance in ideal conditions. However, significant degradations in accuracy are found in channel-mismatched scenarios. Non-stationary environmental noises and their variations are listed at the top of speaker recognition challenges. Gammtone frequency cepstral coefficient method (GFCC) has been developed to improve the robustness of speaker recognition. This paper presents systematic comparisons between performance of GFCC and conventional MFCC-based speaker verification systems with a purposely collected noisy speech data set. Furthermore, the current work extends the experiments to include investigations into language independency features in recognition phases. The results show that GFCC has better verification performance in noisy environments than MFCC. However, the GFCC shows a higher sensitivity to language mismatch between enrollment and recognition phase.
Convention Paper 9577 (Purchase now)
P18-2 A Reliable Singing Voice-Driven MIDI Controller Using Electroglottographic Signal—Kostas Kehrakos, TEI of Crete - E. Daskalaki, Greece; Christos Chousidis, Brunel University - Uxbridge, London, UK; Spyros Kouzoupis, TEI of Crete - E. Daskalaki, Greece
Modern synthesizers and software created a need for encoding musical performance. However, human singing voice, which is the dominant means of musical expression, lacks a reliable encoding system. This is because of the difficulties we face to extract the necessary control information from its heavy harmonic content. In this paper a novel singing voice MIDI controller, based on the Electroglottography is proposed. Electroglottographic signal has lower harmonic content than audio, but it contains enough information to describe music expression. The system uses autocorrelation for pitch extraction and a set of supplementary algorithms to provide information for dynamics and duration. The results show that the proposed system can serve as a platform for the development of reliable singing voice MIDI controllers.
Convention Paper 9578 (Purchase now)