Session F: MUSICAL ACOUSTICS
Saturday, May 11, 09:00 13:30 h The presented study is aimed to extract parameters from musical sounds that can be useful in the musical sound recognition process. For this purpose time-frequency transform analysis employing various filters is performed on musical sounds representing twelve instrument classes. Three groups of instruments are taken into account, namely: wind, string and percussive. Examples of wavelet analyses of various musical instrument sounds are presented. On this basis a number of parameters was extracted and statistically analyzed. Parameters that are correlated are removed from the feature vector. In this way a number of parameters in the feature vector can be diminished from dozens to a few most important ones. Then feature vectors were fed to the Artificial Neural Network inputs and classification experiments were performed. Furthermore originally developed Frequency Envelope Distribution method was applied to divide musical signal into harmonic and inharmonic content. Those signals were also parameterized and used in recognition experiments. Some experiment results are presented. The derived conclusions are also included in the paper. The extraction of semantic features from audio signals is a research area that is of the most interest for the automatic classification, indexation and retrieval of audio material. Furthermore, real-time interactive systems that use sound as an input may also benefit from the development of such technologies in order to achieve a better interaction with real-world events. This paper addresses the analysis of acoustical music signals, and in particular, its transcription to MIDI (Musical Instrument Digital Interface). The development and implementation of an automatic transcription system are presented and the results obtained are evaluated and discussed. A method to identify notes and chords of piano is presented. A simplified piano model based on physical properties has been developed for generating spectral patterns used for identification of notes and chords. The patterns generated are used to measure correlation with the chord to be identified. The results using typical values for the model parameters are good enough, but they are improved if the model is trained. Successful recognition of three-note piano chords has been carried out. Real-time physical modeling may have important applications in audio coding and for music industry. Nevertheless, no efficient solutions have been proposed for modeling the radiation of string instrument body, when commuted synthesis cannot be applied. In this novel multi-rate approach, the string signal is split into two frequency bands. The lower is filtered by a long FIR filter running at a considerably lower sampling rate, precisely synthesizing the body impulse response up to 2kHz. In the high frequency band only the overall magnitude response of the body is modeled, using a low-order filter. The filters are calculated from measurements of real instruments. This enables the physical modeling of string instrument tones in real-time with high sound quality This paper illustrates a signal adaptive analysis technique to transcribe monophonic sounds. Unlike other models, which segment audio relying on the onset time-domain analysis, this model principally exploits pitch information. Pitch is locked after detection. The structure of a musical note, i.e. harmonic frequency structure and time-envelope model, is exploited to segment and transcribe the signal. The system is inspired by the Integrated Processing and Understanding of Signals system (IPUS) where abstract explanation and best front-end configuration are iteratively searched. Onsets and pitch are searched in two different domains and integrated with the system knowledge to give a coherent interpretation of the signal. The system transcribes with success from fast trumpet riffs to long sustained violin vibrato. We present a pitch estimation method for the separation of musical sounds. From a sinusoid plus autoregressive noise model of the composite signal, we successively estimate the fundamental frequencies using the Minimum Description Length principle or Rissanen Criterion. Equivalent to a penalized log-likelihood, this criterion also allows to detect the onset and offset of each sound. The detection is based upon the likelihood gap between the estimate model and a simple autoregressive model. Several simulations for single or multiple musical sounds are performed to illustrate the effectiveness of the method. The authors introduce the idea of performing it Intelligent ICA to focus on and separate a specific instrument, voice or sound source of interest. This is achieved by incorporating high-level probabilistic priors in the ICA model that characterize each instrument or voice. For instrument modeling, we experimented with various feature sets previously used for instrument or speaker recognition. Prior training of a Gaussian Mixture Model for each instrument was performed. The order of the feature vector, the number of gaussian mixtures and the training audio data length were kept to reasonably minimum levels. We present a method using temporal information of musical audio to considerably improve time-scaling with pitch-preservation. This work builds on a signal model where the original signal is split into transient and steady state components, based on an adaptive multi-resolution analysis. By considering only the transient signal content, temporal segmentation is achieved with a much higher degree of accuracy than standard onset detection algorithms. Only the segmented steady state regions are then stretched, whilst phase is locked in the temporally masked regions at transients. Despite local variances in stretching factor, rhythm is maintained globally, yielding perceptually very high quality results for a range of complex polyphonic musical signals, at a low computational cost. This paper describes the development of an audio fingerprint called AudioDNA designed to be robust against several distortions including those related to radio broadcasting. A complete system, covering also a fast and efficient method for comparing observed fingerprints against a huge database with reference fingerprints is described. The promising results achieved with the first prototype system observing music titles as well as commercials are presented. |
|