AES London 2010
Poster Session P23
Tuesday, May 25, 10:30 — 12:00
(Room C4-Foyer)
P23 - Audio Processing—Music and Speech Signal Processing
P23-1 Beta Divergence for Clustering in Monaural Blind Source Separation—Martin Spiertz, Volker Gnann, RWTH Aachen University - Aachen, Germany
General purpose audio blind source separation algorithms have to deal with a large dynamic range for the different sources to be separated. In the used algorithm the mixture is separated into single notes. These notes are clustered to construct the melodies played by the active sources. The non-negative matrix factorization (NMF) leads to good results in clustering the notes according to spectral features. The cost function for the NMF is controlled by the parameter beta. Beta should be adjusted properly depending on the dynamic difference of the sources. The novelty of this paper is to propose a simple unsupervised decision scheme that estimates the optimal parameter beta for increasing the separation quality over a large range of dynamic differences.
Convention Paper 8130 (Purchase now)
P23-2 On the Effects of Room Reverberation in 3-D DOA Estimation Using a Tetrahedral Microphone Array—Maximo Cobos, Jose J. Lopez, Amparo Marti, Universidad Politécnica de Valencia - Valencia, Spain
This paper studies the accuracy in the estimation of the Direction-Of-Arrival (DOA) of multiple sound sources using a small microphone array. As other sparsity-based algorithms, the proposed method is able to work in undetermined scenarios, where the number of sound sources exceeds the number of microphones. Moreover, the tetrahedral shape of the array allows estimation of DOAs in the three-dimensional space easily, which is an advantage over other existing approaches. However, since the proposed processing is based on an anechoic signal model, the estimated DOA vectors are severely affected by room reflections. Experiments to analyze the resultant DOA distribution under different room conditions and source arrangements are discussed using both simulations and real recordings.
Convention Paper 8131 (Purchase now)
P23-3 Long Term Cepstral Coefficients for Violin Identification—Ewa Lukasik, Poznan University of Technology - Poznan, Poland
Cepstral coefficients in mel scale proved to be efficient features for speaker and musical instrument recognition. In this paper Long Term Cepstral Coefficients—LTCCs—of solo musical phrases are used as features for identification of individual violins. LTCC represents the envelope of LTAS—Long Term Average Spectrum—in linear scale useful to characterize the subtleties’ of violin sound in frequency domain. Results of the classification of 60 instruments are presented and discussed. It was shown, that if the experts’ knowledge is applied to analyze violin sound, the results may be promising.
Convention Paper 8132 (Purchase now)
P23-4 Adaptive Source Separation Based on Reliability of Spatial Feature Using Multichannel Acoustic Observations—Mitsunori Mizumachi, Kyushu Institute of Technology - Kitakyushu, Fukuoka, Japan
Separation of sound source can be achieved by spatial filtering with multichannel acoustic observations. However, the right algorithm should be prepared in each condition of acoustic scene. It is difficult to provide the suitable algorithm under real acoustic environments. In this paper an adaptive source separation scheme is proposed based on the reliability of a spatial feature, which gives an estimate of direction of arrival (DOA). As confidence measures for DOA estimates, the third and fourth moments for spatial features are employed to measure how sharp the main-lobes of spatial features are. This paper proposes to selectively use either spatial filters or frequency-selective filters without spatial filtering depending on the reliability of each DOA estimate.
Convention Paper 8133 (Purchase now)
P23-5 A Heuristic Text-Driven Approach for Applied Phoneme Alignment—Konstantinos Avdelidis, Charalampos Dimoulas, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
The paper introduces a phoneme matching algorithm considering a novel concept of functional strategy. In contrast to the classic methodologies that are focusing on the convergence to a fixed expected phonemic sequence (EPS), the presented method follows a more realistic approach. Based on text input, a soft EPS is populated taking into consideration the structural and linguistic deviations that may appear in a naturally spoken sequence. The results of the matching process is evaluated using fuzzy inference and is consisted of both the phoneme transition positions as well as the actual utterance phonemic content. An overview of convergence quality performance through a series of runs for the Greek language is presented.
Convention Paper 8134 (Purchase now)
P23-6 Speech Enhancement with Hybrid Gain Functions—Xuejing Sun, Kuan-Chieh Yen, Cambridge Silicon Radio - Auburn Hills, MI, USA
This paper describes a hybrid gain function for single-channel acoustic noise suppression systems. The proposed gain function consists of Wiener filter and Minimum Mean Square Error – Log Spectral Amplitude estimator (MMSE-LSA) gain functions and selects respected gain values accordingly. Objective evaluation using a composite measure shows the hybrid gain function yields better results over using either of the two functions alone.
Convention Paper 8135 (Purchase now)
P23-7 Human Voice Modification Using Instantaneous Complex Frequency—Magdalena Kaniewska, Gdansk University of Technology - Gdansk, Poland
The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants’ positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully characterized by its envelope and the other with positive instantaneous frequency. ICFs of the factors are modified individually for different sound effects.
Convention Paper 8136 (Purchase now)
P23-8 Designing Optimal Phoneme-Wise Fuzzy Cluster Analysis—Konstantinos Avdelidis, Charalampos Dimoulas, George Kalliris, George Papanikolaou, Aristotle University of Thessaloniki - Thessaloniki, Greece
A large number of pattern classification algorithms and methodologies have been proposed for the phoneme recognition task during the last decades. The current paper presents a prototype distance-based fuzzy classifier, optimized for the needs of phoneme recognition. This is accomplished by the specially designed objective function and a respective training strategy. Particularly, each phonemic class is represented by a number of arbitrary-shaped clusters that adaptively match the corresponding features space distribution. The formulation of the approach is capable of delivering a variety of related conclusions based on fuzzy logic arithmetic. An overview of the inference capability is presented in combination with performance results for the Greek language.
Convention Paper 8137 (Purchase now)