Saturday, October 1, 10:45 am — 12:15 pm (Rm 409B)
Chair:
Jean-Marc Jot, DTS, Inc. - Los Gatos, CA, USA
P19-1 Efficient Multichannel Audio Transform Coding with Low Delay and Complexity—Florian Schuh, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Sascha Dick, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Richard Füg, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian R. Helmrich, International Audio Laboratories - Erlangen, Germany; Nikolaus Rettelbach, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Tobias Schwegler, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
For multichannel input such as 5.1-surround material, contemporary transform-based perceptual audio codecs provide good coding quality even at very low bit-rates. These codecs rely on discrete joint-channel coding and parametric spatial coding schemes. The latter require dedicated complex-valued filter-banks around the core-codec, which increase both the algorithmic complexity and latency. This paper demonstrates that the discrete joint-channel as well as known semi-parametric spatial coding principles can also be realized directly within the real-valued modified discrete cosine transform (MDCT) domain of the core-coder, thereby eliminating the need for auxiliary filter-banks. The resulting fully flexible signal-adaptive coding scheme, when integrated into the MPEG-H 3D Audio codec, offers the same quality as the state of the art even at bit-rates as low as 80 kbit/s for 5.1-surround.
Convention Paper 9660 (Purchase now)
P19-2 Intelligent Gap Filling in Perceptual Transform Coding of Audio—Sascha Disch, Fraunhofer IIS, Erlangen - Erlangen, Germany; Andreas Niedermeier, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Christian R. Helmrich, International Audio Laboratories - Erlangen, Germany; Christian Neukam, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Konstantin Schmidt, International Audio Laboratories Erlangen - Erlangen, Germany; Ralf Geiger, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Jérémie Lecomte, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Florin Ghido, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Frederik Nagel, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; International Audio Laboratories - Erlangen, Germany; Bernd Edler, International Audio Laboratories Erlangen - Erlangen, Germany
Intelligent Gap Filling (IGF) denotes a semi-parametric coding technique within modern codecs like MPEG-H-3D-Audio or the 3gpp-EVS-codec. IGF can be applied to fill spectral holes introduced by the quantization process in the encoder due to low-bitrate constraints. Typically, if the limited bit budget does not allow for transparent coding, spectral holes emerge in the high-frequency (HF) region of the signal first and increasingly affect the entire upper spectral range for lowest bitrates. At the decoder side, such spectral holes are substituted via IGF using synthetic HF content generated out of low-frequency (LF) content, and post-processing controlled by additional parametric side information. This paper provides an overview of the principles and functionalities of IGF and presents listening test data assessing the perceptual quality of IGF coded audio material.
Convention Paper 9661 (Purchase now)
P19-3 Sonic Quick Rresponse Codes (SQRC) for Embedding Inaudible Metadata in Sound Files—Mark Sheppard, Anglia Ruskin University - Cambridge, Cambridgeshire, UK; Rob Toulson, Anglia Ruskin University - Cambridge, UK; Mariana Lopez, Anglia Ruskin University - Cambridge, UK
With the advent of high definition recording and playback systems, a proportion of the ultrasonic frequency spectrum can potentially be used as a container for unperceivable data and used to trigger events or to hold metadata in the form of text, ISRC (International Standard Recording Code) or a website URL. The Sonic Quick Response Code (SQRC) algorithm is proposed as a method for embedding inaudible acoustic metadata within a 96 kHz audio file in the 30–35 kHz bandwidth range. Thus any receiver that has sufficient bandwidth and decode software installed can immediately find metadata on the audio being played. SQRC data was mixed at random periods into 96 kHz music audio files and listening subjects were asked to identify if they perceived the introduction of the high frequency content. Results show that none of the subjects in this pilot study could perceive the 30–35 kHz material. As a result, it is shown that it is possible to conduct high-resolution audio testing without significant or perceptible artifacts caused by intermodulation distortion.
Convention Paper 9662 (Purchase now)