Friday, September 30, 9:00 am — 10:30 am (Rm 409B)
Chair:
Agnieszka Roginska, New York University - New York, NY, USA
P9-1 The Audio Definition Model—A Flexible Standardized Representation for Next Generation Audio Content in Broadcasting and Beyond—Simone Füg, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; David Marston, BBC R&D - London, UK; Scott Norcross, Dolby Laboratories - San Francisco, CA, USA
Audio for broadcasting around the world is evolving towards new immersive and personalized audio formats, which require accompanying metadata along with the audio essence. This paper introduces the Audio Definition Model (ADM), as specified in Recommendation ITU-R BS.2076 (and EBU Tech 3364), which is an XML-based generalized metadata model that can be used to define the required metadata. The ADM is able to describe channel-based, object-based, and scene-based audio in a formalized way. The ADM can be incorporated into RIFF/WAV-based files, such as BW64 (Recommendation ITU-R BS.2088), and can therefore be deployed in RIFF/WAV-based applications, handling immersive content, while maintaining compatibility to legacy content. This allows for program production and exchange of audio programs in these new audio formats.
Convention Paper 9626 (Purchase now)
P9-2 Development of Semantic Scales for Music Mastering—Esben Skovenborg, TC Electronic - Risskov, Denmark
Mastering is the last stage in a music production, and entails modifications of the music's spectral and dynamic properties. This paper presents the development of a set of semantic scales to characterize the (change in) perceptual properties of the sound associated with mastering. An experiment was conducted with audio engineers as subjects. Verbal elicitation and refinement procedures resulted in a list of 30 attributes. Next, 70 unmastered music segments were rated on scales corresponding to the attributes. Based on clustering and statistics of the responses, groups of similar and opposite attributes were formed. The outcome was a set of seven bipolar semantic scales. These scales could be used in semantic differentiation, to rate typical alterations of sound caused or desired by mastering.
Convention Paper 9627 (Purchase now)
P9-3 A Hierarchical Sonification Framework Based on Convolutional Neural Network Modeling of Musical Genre—Shijia Geng, University Of Miami - Coral Gables, FL, USA; Gang Ren, University of Miami - Coral Gables, FL, USA; Mitsunori Ogihara, University of Miami - Coral Gables, FL, USA
Convolutional neural networks have satisfactory discriminative performances for various music-related tasks. However, the models are implemented as “black boxes” and thus their processed representations are non-transparent for manual interactions. In this paper, a hierarchical sonification framework with a musical genre modeling module and a sample-level sonification module has been implemented for aural interaction. The modeling module trains a convolutional neural network from musical signal segments with genre labels. Then the sonification module performs sample-level modification according to each convolutional layer, where lower sonification levels produce auralized pulses and higher sonification levels produce audio signals similar to the input musical signal. The usage of the proposed sonification framework is demonstrated using a musical stylistic morphing example.
Convention Paper 9628 (Purchase now)