Session D: COMPUTER AND INTERNET AUDIO
Friday, May 10, 13:30 17:00 h Much interest has recently been received by systems for audio fingerprinting, which enable automatic identification of audio content by extracting unique signatures from the signal. Requirements for such systems include robustness to a wide range of signal distortions and availability of fast search methods, even for large fingerprint databases. This paper describes the provisions of the MPEG-7 standard for audio fingerprinting which allow for interoperability of fingerprint information generated according to the open standardized specification for extraction. In particular, it discusses the ability to generate scalable fingerprints providing different trade-offs between fingerprint compactness, temporal coverage and robustness of recognition, and gives experimental results for various system configurations. In this paper we present a new method for robust audio identification. Based on our existing audio indexing technology, we developed new methods to query large audio data bases with highly distorted versions of an audio signal or parts of them. For instance the data base could be queried by transmitting a piece of music using a cellular phone. In contrast to recent approaches, arbitrary segments of a piece of music are allowed as a query. We demonstrate that our method for any short audio fragment with length exceeding approximately five seconds, is able to identify the corresponding piece of audio along with the exact position of the fragment within the original signal. Our approach only relies on features extracted from the audio signals hence making the embedding of, e.g. watermarks obsolete. Natural and structured audio coding are based on two fundamentally different ways of representing the sound information; combination of the two approaches can lead to efficient and improved storage and transmission of both speech and music. Integration of natural audio tracks with synthetic sound and digital post processing is a challenging effort, especially when the audio rendering requires good quality and precise synchronization with video and graphic information, as it is the case in standardized multimedia frameworks. In this paper those challenges will be analyzed and a new player will be presented, which integrates all the mentioned features in a normative context. MPEG-4 is designed as a transport-independent, universal coding standard for multimedia content. ISMA, a consortium of 30 companies, has created an open, non-proprietary transport protocol for streaming of MPEG-4 over IP networks. For audio data, the ISMA specification 1.0 contains several modes which include optional interleaved transmission. At Fraunhofer IIS ISMA-compliant server and client applications have been implemented and are used for various performance and robustness tests. In the paper we investigate the characteristics of MPEG-4 Audio over the ISMA protocol. In particular we show how packet loss affects the subjective quality of the audio signal and quantify how well interleaving and concealment can restore a damaged signal. This paper proposes a real-time audio streaming method that dynamically adjusts time-varying network loads. Most current Internet services are best-effort services in that the Quality of Service (QoS) is not guaranteed. The proposed streaming method is based on RTP/RTCP/UDP protocols in order to handle transient changes of network loads. To cope with time-varying conditions in networks, initial network bandwidth and available bandwidth are measured. The Bit-Sliced Arithmetic Coding (BSAC) that provides efficient Fine Grain Scalability (FGS) is used to adapt to network load fluctuation. The results show that our proposed scheme with the BSAC tool allows a real-time continuous play over networks by maximizing QoS of transport data. The paper examines the transmission of audio coded according to the MPEG-I Layer III standard over the Bluetooth Wireless Digital protocol. The study will present the effect of the transmission parameters (such as distance, packet type and length) on the achieved wireless bit rate. The paper will also analyze several aspects concerning the real-time implementation of complete Bluetooth-based audio playback setups and will address the effects of using single-channel, stereo or multichannel audio streams. Modern real-time digital audio communication systems rely heavily on discrete time processing at every stage of the journey from source to distinction. More often than not implemented in the ubiquitous DSP, processes such as A/D & D/A conversion, audio compression for data rate reduction, sample rate conversion, reverse or inverse multiplexing, transfer of data through the layers of a telecomm protocol stack, etc, all contribute to the end-to-end delay of the audio. Knowledge of what causes these delays can aid system designers and integrators in setting up audio links that minimize such delays. This paper examines the practical details of why and where these delays occur, under the following headings :
|
|