Events of the AES: AES 112th Convention: Session D: COMPUTER AND INTERNET AUDIO

Return to 112th

Detailed Calendar
(in Excel)

Calendar (in PDF)

Chairman's Welcome

Exhibitors

Metadata Symposium

Special Events

Papers

Workshops

Technical Tours

Cultural Tours

Students

Historical

Heyser Lecture

Information
for Authors

Tech Comm Mtgs

Standards Comm Mtgs

Travel

Hotel Information

Registration

Session D: COMPUTER AND INTERNET AUDIO

Friday, May 10, 13:30 – 17:00 h
Chair: Gerhard Stoll, IRT, München, Germany

13:30 h
D1 MPEG-7 Scalable Robust Audio Fingerprinting – Thorsten Kastner, Eric Allamanche, Jürgen Herre, Markus Cremer, Holger Grossmann, Oliver Hellmuth, Fraunhofer Institute for Integrated Circuits, Erlangen, Germany

Much interest has recently been received by systems for audio fingerprinting, which enable automatic identification of audio content by extracting unique signatures from the signal. Requirements for such systems include robustness to a wide range of signal distortions and availability of fast search methods, even for large fingerprint databases. This paper describes the provisions of the MPEG-7 standard for audio fingerprinting which allow for interoperability of fingerprint information generated according to the open standardized specification for extraction. In particular, it discusses the ability to generate scalable fingerprints providing different trade-offs between fingerprint compactness, temporal coverage and robustness of recognition, and gives experimental results for various system configurations.
Convention Paper 5511

14:00 h
D2 Identification of Highly Distorted Audio Material for Querying Large Scale Data Bases – Frank Kurth, Andreas Ribbrock, Michael Clausen, University of Bonn, Bonn, Germany

In this paper we present a new method for robust audio identification. Based on our existing audio indexing technology, we developed new methods to query large audio data bases with highly distorted versions of an audio signal or parts of them. For instance the data base could be queried by transmitting a piece of music using a cellular phone. In contrast to recent approaches, arbitrary segments of a piece of music are allowed as a query. We demonstrate that our method for any short audio fragment with length exceeding approximately five seconds, is able to identify the corresponding piece of audio along with the exact position of the fragment within the original signal. Our approach only relies on features extracted from the audio signals hence making the embedding of, e.g. watermarks obsolete.
In our work we furthermore give an overview on our extensive tests using a database of several 1000 items of audio (approximately one month of audio) demonstrating the capability of our new method.
Convention Paper 5512

14:30 h
D3 Mixing Natural and Structured Audio Coding in Multimedia Frameworks – Giorgio Zoia¹, Aleksandar Simeonov¹, Ruohua Zhou¹, Stefano Battista² - ¹EPFL, Lausanne, Switzerland; ²BSOFT srl, Macerata, Italy

Natural and structured audio coding are based on two fundamentally different ways of representing the sound information; combination of the two approaches can lead to efficient and improved storage and transmission of both speech and music. Integration of natural audio tracks with synthetic sound and digital post processing is a challenging effort, especially when the audio rendering requires good quality and precise synchronization with video and graphic information, as it is the case in standardized multimedia frameworks. In this paper those challenges will be analyzed and a new player will be presented, which integrates all the mentioned features in a normative context.
Convention Paper 5513

15:00 h
D4 Characteristics of MPEG-4 Audio Streaming over IP Networks within the ISMA Standard – Bernhard Grill, Thomas Hahn, Daniel Homm, Fraunhofer Institute for Integrated Circuits, Erlangen, Germany

MPEG-4 is designed as a transport-independent, universal coding standard for multimedia content. ISMA, a consortium of 30 companies, has created an open, non-proprietary transport protocol for streaming of MPEG-4 over IP networks. For audio data, the ISMA specification 1.0 contains several modes which include optional interleaved transmission. At Fraunhofer IIS ISMA-compliant server and client applications have been implemented and are used for various performance and robustness tests. In the paper we investigate the characteristics of MPEG-4 Audio over the ISMA protocol. In particular we show how packet loss affects the subjective quality of the audio signal and quantify how well interleaving and concealment can restore a damaged signal.
Convention Paper 5514

15:30 h
D5 A Real-Time Audio Streaming Method for Time-Varying Network Loads - Sang-Jo Lee, Sang-Wook Kim, Eunmi Oh, Samsung Advanced Institute of Technology, Suwon, Korea

This paper proposes a real-time audio streaming method that dynamically adjusts time-varying network loads. Most current Internet services are best-effort services in that the Quality of Service (QoS) is not guaranteed. The proposed streaming method is based on RTP/RTCP/UDP protocols in order to handle transient changes of network loads. To cope with time-varying conditions in networks, initial network bandwidth and available bandwidth are measured. The Bit-Sliced Arithmetic Coding (BSAC) that provides efficient Fine Grain Scalability (FGS) is used to adapt to network load fluctuation. The results show that our proposed scheme with the BSAC tool allows a real-time continuous play over networks by maximizing QoS of transport data.
Convention Paper 5515

16:00 h
D6 A Study of Wireless Compressed Digital Audio Transmission – John Mourjopoulos¹, Andreas Floros²,Nicolas-Alexander Tatlas¹, Marios Koutroubas¹ - ¹University of Patras, Patras, Greece; ²ATMEL Hellas SA, Patras, Greece

The paper examines the transmission of audio coded according to the MPEG-I Layer III standard over the Bluetooth Wireless Digital protocol. The study will present the effect of the transmission parameters (such as distance, packet type and length) on the achieved wireless bit rate. The paper will also analyze several aspects concerning the real-time implementation of complete Bluetooth-based audio playback setups and will address the effects of using single-channel, stereo or multichannel audio streams.
Convention Paper 5516

16:30 h
D7 How Much Delay is Too Much Delay? – Michael Feerick, APT - Audio Processing Technology, Belfast, UK

Modern real-time digital audio communication systems rely heavily on discrete time processing at every stage of the journey from source to distinction. More often than not implemented in the ubiquitous DSP, processes such as A/D & D/A conversion, audio compression for data rate reduction, sample rate conversion, reverse or inverse multiplexing, transfer of data through the layers of a telecomm protocol stack, etc, all contribute to the end-to-end delay of the audio. Knowledge of what causes these delays can aid system designers and integrators in setting up audio links that minimize such delays. This paper examines the practical details of why and where these delays occur, under the following headings :

Physiology of Delay
Conversion
Audio Bit Rate Reduction
Data Transmission Circuits
System View

Convention Paper 5517