AES London 2010
Poster Session P16
P16 - Audio Processing—Audio Coding and Machine Interface
Monday, May 24, 10:30 — 12:00 (Room C4-Foyer)
P16-1 Trajectory Sampling for Computationally Efficient Reproduction of Moving Sound Sources—Nara Hahn, Keunwoo Choi, Hyunjoo Chung, Koeng-Mo Sung, Seoul National University - Seoul, Korea
Reproducing moving virtual sound sources has been addressed in number of spatial audio studies. The trajectories of moving sources are relatively oversampled if they are sampled in temporal domain, as this leads to inefficiency in computing the time delay and amplitude attenuation due to sound propagation. In this paper methods for trajectory sampling are proposed that use the spatial property of the movement and the spectral property of the received signal. These methods reduce the number of sampled positions and reduce the computational complexity. Listening tests were performed to determine the appropriate downsampling rate that depends not only on the trajectory but on the frequency content of the source signal.
Convention Paper 8080 (Purchase now)
P16-2 Audio Latency Measurement for Desktop Operating Systems with Onboard Soundcards—Yonghao Wang, Queen Mary University of London - London, UK, Birmingham City University, Birmingham, UK; Ryan Stables, Birmingham City University - Birmingham, UK; Joshua Reiss, Queen Mary University of London - London, UK
Using commodity computers in conjunction with live music digital audio workstations (DAW) has become increasingly more popular in recent years. The latency of these DAW audio processing chains for some application such as live audio monitoring has always been perceived as a problem when DSP audio effects are needed. With “High Definition Audio” being standardized as the onboard soundcard’s hardware architecture for personal computers, and with advances in audio APIs, the low latency and multichannel capability has made its way into home studios. This paper will discuss the results of latency measurements of current popular operating systems and hosts applications with different audio APIs and audio processing loads.
Convention Paper 8081 (Purchase now)
P16-3 Error Control Techniques within Multidimensional-Adaptive Audio Coding Algorithms—Neil Smyth, APTX - Belfast, N. Ireland, UK
Multidimensional-adaptive audio coding algorithms can adapt multiple performance measures to the demands of different audio applications in real-time. Depending on the transmission or storage environment, audio processing applications require forms of error control to maintain acceptable audio quality. By definition, multidimensional-adaptive audio coding utilizes numerous error detection, correction, and concealment techniques. However, such techniques also have implications for other relevant performance measurements, such as coded bit-rate and computational complexity. This paper discusses the signal-processing tools used by a multidimensional-adaptive audio coding algorithm to achieve varying levels of error control while the fundamental structure of the algorithm is also varying. The effects and trade-offs on other coding performance measures will also be discussed.
Convention Paper 8082 (Purchase now)
P16-4 Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation—Joonas Nikunen, Tuomas Virtanen, Tampere University of Technology - Tampere, Finland
This paper proposes a new object-based audio coding algorithm, which uses non-negative matrix factorization (NMF) for the magnitude spectrogram representation and the phase information is coded separately. The magnitude model is obtained using a perceptually weighted NMF algorithm, which minimizes the noise-to-mask ratio (NMR) of the decomposition, and is able to utilize long term redundancy by an object-based representation. Methods for the quantization and entropy coding of the NMF representation parameters are proposed and the quality loss is evaluated using the NMR measure. The quantization of the phase information is also studied. Additionally we propose a sparseness criteria for the NMF algorithm, which is set to favor the gain values having the highest probability and thus the shortest entropy coding word length, resulting to a reduced bit rate.
Convention Paper 8083 (Purchase now)
P16-5 Cross-Layer Rate-Distortion Optimization for Scalable Advanced Audio Coding—Emmanuel Ravelli, Vinay Melkote, Tejaswi Nanjundaswamy, Kenneth Rose, University of California, Santa Barbara - Santa Barbara, CA, USA
Current scalable audio codecs optimize each layer of the bit-stream successively and independently with a straightforward application of the same rate-distortion optimization techniques employed in the non-scalable case. The main drawback of this approach is that the performance of the enhancement layers is significantly worse than that of the non-scalable codec at the same cumulative bit-rate. We propose in this paper a novel optimization technique in the Advanced Audio Coding (AAC) framework wherein a cross-layer iterative optimization is performed to select the encoding parameters for each layer with a conscious accounting of rate and distortion costs in all layers, which allows for a trade-off between performance at different layers. Subjective and objective results demonstrate the effectiveness of the proposed approach and provide insights for bridging the gap with the non-scalable codec.
Convention Paper 8084 (Purchase now)
P16-6 Issues and Solutions Related to Real-Time TD-PSOLA Implementation—Sylvain Le Beux, LIMSI-CNRS, Université Paris-Sud XI - Orsay, France; Boris Doval, LAM-IJLRA, Université Paris - Paris, France; Christophe d'Alessandro, LIMSI-CNRS, Université Paris-Sud XI - Orsay, France
This paper presents a procedure adaptation for the calculation of TD-PSOLA algorithm when the processing of pitch-shifting and time-stretching coefficients needs to be achieved in real-time at every new synthesis pitch mark. In the scope of standard TD-PSOLA algorithm, modification coefficients are defined from the analysis time axis whereas for real-time applications, pitch and duration control parameters need to be sampled at synthesis time. This paper will establish the theoretical correspondence between both approaches. Another issue related to real-time context concerns the trade-off between the latency required for processing and the type of analysis window used.
Convention Paper 8085 (Purchase now)
P16-7 Integrating Musicological Knowledge into a Probabilistic Framework for Chord and Key Extraction—Johan Pauwels, Jean-Pierre Martens, Ghent University - Ghent, Belgium
In this paper a formerly developed probabilistic framework for the simultaneous detection of chords and keys in polyphonic audio is further extended and validated. The system behavior is controlled by a small set of carefully defined free parameters. This has permitted us to conduct an experimental study that sheds a new light on the relative importance of musicological knowledge in the context of chord extraction. Some of the obtained results are at least surprising and, to our knowledge, never reported as such before.
Convention Paper 8086 (Purchase now)
P16-8 A Doubly Sparse Greedy Adaptive Dictionary Learning Algorithm for Music and Large-Scale Data—Maria G. Jafari, Mark D. Plumbley, Queen Mary University of London - London, UK
We consider the extension of the greedy adaptive dictionary learning algorithm that we introduced previously to applications other than speech signals. The algorithm learns a dictionary of sparse atoms, while yielding a sparse representation for the speech signals. We investigate its behavior in the analysis of music signals and propose a different dictionary learning approach that can be applied to large data sets. This facilitates the application of the algorithm to problems that generate large amounts of data, such as multimedia and multichannel application areas.
Convention Paper 8087 (Purchase now)