Monday, May 22, 09:00 — 11:00 (Salon 2+3 Rome)
John Mourjopoulos (Chair)
P15-01 Close Miking Empirical Practice Verification: A Source Separation Approach
Stylianos Ioannis Mimilakis (Presenting Author), Konstantinos Drossos (Author), Andreas Floros (Author), Gerald Schuller (Author), Tuomas Virtanen (Author)
Close miking represents a widely employed practice of placing a microphone very near to the sound source in order to capture more direct sound and minimize any pickup of ambient sound, including other, concurrently active sources. It is used by the audio engineering community for decades for audio recording, based on a number of empirical rules that were evolved during the recording practice itself. But can this empirical knowledge and close miking practice be systematically verified? In this work we aim to address this question based on an analytic methodology that employs techniques and metrics originating from the sound source separation evaluation field. In particular, we apply a quantitative analysis of the source separation capabilities of the close miking technique. The analysis is applied on a recording dataset obtained at multiple positions of a typical musical hall, multiple distances between the microphone and the sound source multiple microphone types and multiple level differences between the sound source and the ambient acoustic component. For all the above cases we calculate the Source to Interference Ratio (SIR) metric. The results obtained clearly demonstrate an optimum close-miking performance that matches the current empirical knowledge of professional audio recording.
Convention Paper 9760
P15-02 Audio System Spatial Image Evaluation via Binaural Feature Classification
Gavriil Kamaris (Presenting Author), Stamatis Karlos (Author), Dimitris Koutsaidis (Author), John Mourjopoulos (Author), Stergios Terpinas (Author)
A method for evaluating different audio systems with respect to their spatial reproduction accuracy is described based on binaural auditory feature classification. The classifier is trained to act as expert listener judging system spatial quality and achieves high accuracy for a reference ideal system under anechoic conditions. The trained classifier is then employed to assess different suboptimal reproduction setups and listening conditions. The spatial accuracy was assessed with respect to this reference with respect to the panned image, image localization accuracy, and the sweet spot area spread. For 2 channel stereo reproduction, the study used loudspeakers of different directivity under anechoic or varying reverberant room conditions. The work also assesses the relative effects of upmixing stereo for 5 channel reproduction.
Convention Paper 9761
P15-03 Long-term Average Spectrum in Popular Music and its Relation to the Level of the Percussion
Anders Elowsson (Presenting Author), Anders Friberg (Author)
The spectral distribution of music audio has an important influence on listener perception, but large-scale characterizations are lacking. Therefore, the long-term average spectrum (LTAS) was analyzed for a large dataset of popular music. The mean LTAS was computed, visualized, and then approximated with two quadratic fittings. The fittings were subsequently used to derive the spectrum slope. By applying harmonic/percussive source separation, the relationship between LTAS and percussive prominence was investigated. A clear relationship was found; tracks with more percussion have a relatively higher LTAS in the bass and high frequencies. We show how this relationship can be used to improve targets in automatic equalization. Furthermore, we assert that variations in LTAS between genres is mainly a side-effect of percussive prominence.
Convention Paper 9762
P15-04 Efficient Music Identification Approach Based on Local Spectrogram Image Descriptors
Massimiliano Zanoni (Presenting Author), Paolo Bestagini (Author), Antonio Canclini (Author), Stefano Lusardi (Author), Augusto Sarti (Author), Stefano Tubaro (Author)
The diffusion of large music collections has determined the need for algorithms enabling fast song retrieval from query audio excerpts. This is the case of online media sharing platforms that may want to detect copyrighted material. In this paper we start from a proposed state-of-the-art algorithm for robust music matching based on spectrogram comparison leveraging computer vision concepts. We show that it is possible to further optimize this algorithm exploiting more recent image processing techniques and carrying out the analysis on limited temporal windows, still achieving accurate matching performance. The proposed solution is validated on a dataset of 800 songs, reporting an 80% decrease in computational complexity for an accuracy loss of about only 1%.
Convention Paper 9763