Events of the AES: AES 112th Convention: Session F: MUSICAL ACOUSTICS

Return to 112th

Detailed Calendar
(in Excel)

Calendar (in PDF)

Chairman's Welcome

Exhibitors

Metadata Symposium

Special Events

Papers

Workshops

Technical Tours

Cultural Tours

Students

Historical

Heyser Lecture

Information
for Authors

Tech Comm Mtgs

Standards Comm Mtgs

Travel

Hotel Information

Registration

Session F: MUSICAL ACOUSTICS

Saturday, May 11, 09:00 – 13:30 h
Chair: Neil Gilchrist, Ashtead, UK

09:00 h
F1 Statistical Analysis of Musical Sound Features Derived from Wavelet Representation – Bozena Kostek, Pawel Zwan, Technical University of Gdansk, Gdansk, Poland

The presented study is aimed to extract parameters from musical sounds that can be useful in the musical sound recognition process. For this purpose time-frequency transform analysis employing various filters is performed on musical sounds representing twelve instrument classes. Three groups of instruments are taken into account, namely: wind, string and percussive. Examples of wavelet analyses of various musical instrument sounds are presented. On this basis a number of parameters was extracted and statistically analyzed. Parameters that are correlated are removed from the feature vector. In this way a number of parameters in the feature vector can be diminished from dozens to a few most important ones. Then feature vectors were fed to the Artificial Neural Network inputs and classification experiments were performed. Furthermore originally developed Frequency Envelope Distribution method was applied to divide musical signal into harmonic and inharmonic content. Those signals were also parameterized and used in recognition experiments. Some experiment results are presented. The derived conclusions are also included in the paper.
Convention Paper 5523

09:30 h
F2 PCM to MIDI Transposition – Luis Martins, Aníbal Ferreira, INESC Porto, Porto, Portugal

The extraction of semantic features from audio signals is a research area that is of the most interest for the automatic classification, indexation and retrieval of audio material. Furthermore, real-time interactive systems that use sound as an input may also benefit from the development of such technologies in order to achieve a better interaction with “real-world” events. This paper addresses the analysis of acoustical music signals, and in particular, its transcription to MIDI (Musical Instrument Digital Interface). The development and implementation of an automatic transcription system are presented and the results obtained are evaluated and discussed.
Convention Paper 5524

10:00 h
F3 Pattern Recognition of Piano Chords Based on Physical Model – Luis Ortiz-Berenguer, Francisco Casajús-Quirós, Polytechnical University of Madrid, Madrid, Spain

A method to identify notes and chords of piano is presented. A simplified piano model based on physical properties has been developed for generating spectral patterns used for identification of notes and chords. The patterns generated are used to measure correlation with the chord to be identified. The results using typical values for the model parameters are good enough, but they are improved if the model is trained. Successful recognition of three-note piano chords has been carried out.
Convention Paper 5525

10:30 h
F4 A Multi-Rate Approach to Instrument Body Modeling for Real-Time Sound Synthesis Applications – Balazs Bank¹, Giovanni De Poli², Laszlo Sujbert¹ - ¹Budapest University of Technology and Economics, Budapest, Hungary; ²University of Padua, Padua, Italy

Real-time physical modeling may have important applications in audio coding and for music industry. Nevertheless, no efficient solutions have been proposed for modeling the radiation of string instrument body, when commuted synthesis cannot be applied. In this novel multi-rate approach, the string signal is split into two frequency bands. The lower is filtered by a long FIR filter running at a considerably lower sampling rate, precisely synthesizing the body impulse response up to 2kHz. In the high frequency band only the overall magnitude response of the body is modeled, using a low-order filter. The filters are calculated from measurements of real instruments. This enables the physical modeling of string instrument tones in real-time with high sound quality
Convention Paper 5526

11:00 h
F5 Pitch-Locking Monophonic Music Analysis – Giuliano Monti, Mark Sandler, Queen Mary University of London, London, UK

This paper illustrates a signal adaptive analysis technique to transcribe monophonic sounds. Unlike other models, which segment audio relying on the onset time-domain analysis, this model principally exploits pitch information. Pitch is locked after detection. The structure of a musical note, i.e. harmonic frequency structure and time-envelope model, is exploited to segment and transcribe the signal. The system is inspired by the Integrated Processing and Understanding of Signals system (IPUS) where abstract explanation and best front-end configuration are iteratively searched. Onsets and pitch are searched in two different domains and integrated with the system knowledge to give a coherent interpretation of the signal. The system transcribes with success from fast trumpet riffs to long sustained violin vibrato.
Convention Paper 5527

11:30 h
F6 Pitch Estimation for the Separation of Musical Sounds – Julie Rosier, Yves Grenier, ENST, Paris, France

We present a pitch estimation method for the separation of musical sounds. From a sinusoid plus autoregressive noise model of the composite signal, we successively estimate the fundamental frequencies using the Minimum Description Length principle or Rissanen Criterion. Equivalent to a penalized log-likelihood, this criterion also allows to detect the onset and offset of each sound. The detection is based upon the likelihood gap between the estimate model and a simple autoregressive model. Several simulations for single or multiple musical sounds are performed to illustrate the effectiveness of the method.
Convention Paper 5528

12:00 h
F7 Intelligent Audio Source Separation using ICA – Nikolaos Mitianoudis, Mike Davies, Queen Mary University of London, London, UK

The authors introduce the idea of performing it Intelligent ICA to focus on and separate a specific instrument, voice or sound source of interest. This is achieved by incorporating high-level probabilistic priors in the ICA model that characterize each instrument or voice. For instrument modeling, we experimented with various feature sets previously used for instrument or speaker recognition. Prior training of a Gaussian Mixture Model for each instrument was performed. The order of the feature vector, the number of gaussian mixtures and the training audio data length were kept to reasonably minimum levels.
Convention Paper 5529

12:30 h
F8 Improved Time-Scaling of Musical Audio Using Phase Locking at Transients – Chris Duxbury, Mike Davies, Mark Sandler, Queen Mary University of London, London, UK

We present a method using temporal information of musical audio to considerably improve time-scaling with pitch-preservation. This work builds on a signal model where the original signal is split into transient and steady state components, based on an adaptive multi-resolution analysis. By considering only the transient signal content, temporal segmentation is achieved with a much higher degree of accuracy than standard onset detection algorithms. Only the segmented steady state regions are then stretched, whilst phase is locked in the temporally masked regions at transients. Despite local variances in stretching factor, rhythm is maintained globally, yielding perceptually very high quality results for a range of complex polyphonic musical signals, at a low computational cost.
Convention Paper 5530

13:00 h
F9 Robust Sound Modeling for Song Detection in Broadcasted Audio – Eloi Batlle¹, Pedro Cano¹, Harald Mayer², Helmut Neuschmied² - ¹Pompeu Fabra University, Barcelona, Spain; ²Joanneum Research Forschungsgesellschaft mbH, Graz, Austria

This paper describes the development of an audio fingerprint called AudioDNA designed to be robust against several distortions including those related to radio broadcasting. A complete system, covering also a fast and efficient method for comparing observed fingerprints against a huge database with reference fingerprints is described. The promising results achieved with the first prototype system observing music titles as well as commercials are presented.
Convention Paper 5531