AES Paris 2016
Paper Session P18

P18 - Human Factors and Interfaces


Monday, June 6, 14:45 — 15:45 (Room 352B)

Chair:
Sorgun Akkor, STD Ltd. - Istanbul, Turkey

P18-1 Robustness of Speaker Recognition from Noisy Speech Samples and Mismatched LanguagesAhmed Al-Noori, University of Salford - Salford, Greater Manchester, UK; Francis F. Li, University of Salford - Salford, UK; Philip J. Duncan, University of Salford - Salford, Greater Manchester, UK
Speaker recognition systems can typically attain high performance in ideal conditions. However, significant degradations in accuracy are found in channel-mismatched scenarios. Non-stationary environmental noises and their variations are listed at the top of speaker recognition challenges. Gammtone frequency cepstral coefficient method (GFCC) has been developed to improve the robustness of speaker recognition. This paper presents systematic comparisons between performance of GFCC and conventional MFCC-based speaker verification systems with a purposely collected noisy speech data set. Furthermore, the current work extends the experiments to include investigations into language independency features in recognition phases. The results show that GFCC has better verification performance in noisy environments than MFCC. However, the GFCC shows a higher sensitivity to language mismatch between enrollment and recognition phase.
Convention Paper 9577 (Purchase now)

P18-2 A Reliable Singing Voice-Driven MIDI Controller Using Electroglottographic SignalKostas Kehrakos, TEI of Crete - E. Daskalaki, Greece; Christos Chousidis, Brunel University - Uxbridge, London, UK; Spyros Kouzoupis, TEI of Crete - E. Daskalaki, Greece
Modern synthesizers and software created a need for encoding musical performance. However, human singing voice, which is the dominant means of musical expression, lacks a reliable encoding system. This is because of the difficulties we face to extract the necessary control information from its heavy harmonic content. In this paper a novel singing voice MIDI controller, based on the Electroglottography is proposed. Electroglottographic signal has lower harmonic content than audio, but it contains enough information to describe music expression. The system uses autocorrelation for pitch extraction and a set of supplementary algorithms to provide information for dynamics and duration. The results show that the proposed system can serve as a platform for the development of reliable singing voice MIDI controllers.
Convention Paper 9578 (Purchase now)


Return to Paper Sessions

EXHIBITION HOURS June 5th   10:00 – 18:00 June 6th   09:00 – 18:00 June 7th   09:00 – 16:00
REGISTRATION DESK June 4th   08:00 – 18:00 June 5th   08:00 – 18:00 June 6th   08:00 – 18:00 June 7th   08:00 – 16:00
TECHNICAL PROGRAM June 4th   09:00 – 18:30 June 5th   08:30 – 18:00 June 6th   08:30 – 18:00 June 7th   08:45 – 16:00
AES - Audio Engineering Society