AES San Francisco 2012
Paper Session P12
P12 - Sound Analysis and Synthesis
Saturday, October 27, 2:30 pm — 6:00 pm (Room 122)
Chair:
Jean Laroche, Audience, Inc.
P12-1 Drum Synthesis via Low-Frequency Parametric Modes and Altered Residuals—Haiying Xia, CCRMA, Stanford University - Stanford, CA, USA; Electrical Engineering, Stanford University - Standford, CA, USA; Julius O. Smith, III, Stanford University - Stanford, CA, USA
Techniques are proposed for drum synthesis using a two-band source-filter model. A Butterworth lowpass/highpass band-split is used to separate a recorded “high tom" drum hit into low and high bands. The low band, containing the most salient modes of vibration, is downsampled and Poisson-windowed to accelerate its decay and facilitate mode extraction. A weighted equation-error method is used to fit an all-pole model—the “modal model”—to the first five modes of the low band in the case of the high tom. The modal model is removed from the low band by inverse filtering, and the resulting residual is taken as a starting point for excitation modeling in the low band. For the high band, low-order linear prediction (LP) is used to model the spectral envelope. The bands are resynthesized by feeding the residual signals to their respective all-pole forward filters, upsampling the low band, and summing. The modal model can be modulated to obtain the sound of different drums and other effects. The residuals can be altered to obtain the effects of different striking locations and striker materials.
Convention Paper 8762 (Purchase now)
P12-2 Drum Pattern Humanization Using a Recursive Bayesian Framework—Ryan Stables, Birmingham City University - Birmingham, UK; Cham Athwal, Birmingham City University - Birmingham, UK; Rob Cade, Birmingham City University - Birmingham, UK
In this study we discuss some of the limitations of Gaussian humanization and consider ways in which the articulation patterns exhibited by percussionists can be emulated using a probabilistic model. Prior and likelihood functions are derived from a dataset of professional drummers to create a series of empirical distributions. These are then used to independently modulate the onset locations and amplitudes of a quantized sequence, using a recursive Bayesian framework. Finally, we evaluate the performance of the model against sequences created with a Gaussian humanizer and sequences created with a Hidden Markov Model (HMM) using paired listening tests. We are able to demonstrate that probabilistic models perform better than instantaneous Gaussian models, when evaluated using a 4/4 rock beat at 120 bpm.
Convention Paper 8763 (Purchase now)
P12-3 Procedural Audio Modeling for Particle-Based Environmental Effects—Charles Verron, REVES-INRIA - Sophia-Antipolis, France; George Drettakis, REVES/INRIA Sophia-Antipolis - Sophia-Antipolis, France
We present a sound synthesizer dedicated to particle-based environmental effects, for use in interactive virtual environments. The synthesis engine is based on five physically-inspired basic elements (that we call sound atoms) that can be parameterized and stochastically distributed in time and space. Based on this set of atomic elements, models are presented for reproducing several environmental sound sources. Compared to pre-recorded sound samples, procedural synthesis provides extra flexibility to manipulate and control the sound source properties with physically-inspired parameters. In this paper the controls are used simultaneously to modify particle-based graphical models, resulting in synchronous audio/graphics environmental effects. The approach is illustrated with three models that are commonly used in video games: fire, wind, and rain. The physically-inspired controls simultaneously drive graphical parameters (e.g., distribution of particles, average particles velocity) and sound parameters (e.g., distribution of sound atoms, spectral modifications). The joint audio/graphics control results in a tightly-coupled interaction between the two modalities that enhances the naturalness of the scene.
Convention Paper 8764 (Purchase now)
P12-4 Knowledge Representation Issues in Audio-Related Metadata Model Design—György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
In order for audio applications to interoperate, some agreement on how information is structured and encoded has to be in place within developer and user communities. This agreement can take the form of an industry standard or a widely adapted open framework consisting of conceptual data models expressed using formal description languages. There are several viable approaches to conceptualize audio related metadata, and several ways to describe the conceptual models, as well as encode and exchange information. While emerging standards have already been proven invaluable in audio information management, it remains difficult to design or choose the model that is most appropriate for an application. This paper facilitates this process by providing an overview, focusing on differences in conceptual models underlying audio metadata schemata.
Convention Paper 8765 (Purchase now)
P12-5 High-Level Semantic Metadata for the Control of Multitrack Adaptive Digital Audio Effects—Thomas Wilmering, Queen Mary University of London - London, UK; György Fazekas, Queen Mary University of London - London, UK; Mark B. Sandler, Queen Mary University of London - London, UK
Existing adaptive digital audio effects predominantly use low-level features in order to derive control data. These data do not typically correspond to high-level musicological or semantic information about the content. In order to apply audio transformations selectively on different musical events in a multitrack project, audio engineers and music producers have to resort to manual selection or annotation of the tracks in traditional audio production environments. We propose a new class of audio effects that uses high-level semantic audio features in order to obtain control data for multitrack effects. The metadata is expressed in RDF using several music and audio related Semantic Web ontologies and retrieved using the SPARQL query language.
Convention Paper 8766 (Purchase now)
P12-6 On Accommodating Pitch Variation in Long Term Prediction of Speech and Vocals in Audio Coding—Tejaswi Nanjundaswamy, University of California, Santa Barbara - Santa Barbara, CA, USA; Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Exploiting inter-frame redundancies is key to performance enhancement of delay constrained perceptual audio coders. The long term prediction (LTP) tool was introduced in the MPEG Advanced Audio Coding standard, especially for the low delay mode, to capitalize on the periodicity in naturally occurring sounds by identifying a segment of previously reconstructed data as prediction for the current frame. However, speech and vocal content in audio signals is well known to be quasi-periodic and involve small variations in pitch period, which compromise the LTP tool performance. The proposed approach modifies LTP by introducing a single parameter of “geometric” warping, whereby past periodicity is geometrically warped to provide an adjusted prediction for the current samples. We also propose a three-stage parameter estimation technique, where an unwarped LTP filter is first estimated to minimize the mean squared prediction error; then filter parameters are complemented with the warping parameter, and re-estimated within a small neighboring search space to retain the set of S best LTP parameters; and finally, a perceptual distortion-rate procedure is used to select from the S candidates, the parameter set that minimizes the perceptual distortion. Objective and subjective evaluations substantiate the proposed technique’s effectiveness.
Convention Paper 8767 (Purchase now)
P12-7 Parametric Coding of Piano Signals—Michael Schnabel, Ilmenau University of Technology - Ilmenau, Germany; Benjamin Schubert, Ilmenau University of Technology - Ilmenau, Germany; Fraunhofer IIS - Erlangen, Germany; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany
In this paper an audio coding procedure for piano signals is presented based on a physical model of the piano. Instead of coding the waveform of the signal, the compression is realized by extracting relevant parameters at the encoder. The signal is then re-synthesized at the decoder using the physical model. We describe the development and implementation of algorithms for parameter extraction and the combination of all the components into a coder. A formal listening test was conducted, which shows that we can obtain a high sound quality at a low bit rate, lower than conventional coders. We obtain a bitrate of 11.6 kbps for the proposed piano coder. We use HE-AAC as reference codec at a gross bitrate of 16 kbps. For low and medium chords the proposed piano coder outperforms HE-AAC in terms of subjective quality, while the quality falls below HE-AAC for high chords.
Convention Paper 8768 (Purchase now)