Audio Engineering Society AES Paris 2016

AES Paris 2016
Poster Session P5

P5 - Audio Equipment, Audio Formats, and Audio Signal Processing Part 1


Saturday, June 4, 13:30 — 15:30 (Foyer)

P5-1 A Comparison of Optimization Methods for Compression Driver DesignMichele Gasparini, Universitá Politecnica della Marche - Ancona, Italy; Emiliano Capucci, Faital S.P.A. - Milan, Italy; Stefania Cecchi, Universitá Politecnica della Marche - Ancona, Italy; Romolo Toppi, Faital S.P.A. - Milan, Italy; Francesco Piazza, Universitá Politecnica della Marche - Ancona (AN), Italy
Finite element analysis is a powerful and widespread mathematical technique capable of modeling even very complex physical systems. The use of this method is quite common in loudspeaker design processes, although simulations may often become time consuming. In order to reduce the number of simulations needed to define an optimal design, some advanced metaheuristic algorithms can be employed. The use of these techniques is well known in many optimization tasks when an analytical description of the system is not available a priori. In this paper a comparison among three different optimization procedures in the design of a compression driver is presented. The algorithms will be evaluated in terms of both convergence time and residual error.
Convention Paper 9508 (Purchase now)

P5-2 Physically-Based Large-Signal Modeling for Miniature Type Dual Triode TubeShiori Oshimo, Hiroshima Institute of Technology - Hiroshima, Japan; Kanako Takemoto, Hiroshima Institute of Technology - Hiroshima, Japan; Toshihiko Hamasaki, Hiroshima Institute of Technology - Hiroshima, Japan
A precise SPICE model for miniature (MT) triode tubes of high-µ 12AX7 and medium-µ 12AU7 is proposed, based on the physical analysis of the measurement results. Comparing the characteristics between these tubes, the grid current at lower plate voltage and positive grid bias condition is modeled successfully with novel perveance parameters for the first time, though it was known that the perveance depends on both grid and plate bias. It is shown that the modulation factor of the space charge for the MT triodes is different from the other classic tubes. The model is implemented in LTspice to result in a good replication for a variation of three-order magnitude of grid current and cathode current. Also a lecture—see session P1-2]
Convention Paper 9485 (Purchase now)

P5-3 Delay-Reduced Mode of MPEG-4 Enhanced Low Delay AAC (AAC-ELD)Markus Schnell, Fraunhofer IIS - Erlangen, Germany; Wolfgang Jaegers, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Pablo Delgado, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Conrad Benndorf, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Tobias Albert, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Manfred Lutzky, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
The MPEG-4 AAC Enhanced Low Delay (AAC-ELD) coder is well established in high quality communication applications, such as Apple’s FaceTime, as well as in professional live broadcasting. Both applications require high interactivity, which typically demands an algorithmic codec delay between 15 ms and 35 ms. Recently, MPEG finalized a new delay-reduced mode for AAC-ELD featuring only a fraction of the regular algorithmic delay. This mode operates virtually at higher sampling rates while maintaining standard sampling rates for I/O. Supporting this feature, AAC-ELD can address even more delay critical applications, like wireless microphones or headsets for TV. In this paper main details of the delay-reduced mode of AAC-ELD are presented and application scenarios are outlined. Audio quality aspects are discussed and compared against other codecs with a delay below 10 ms. Also a lecture—see session P1-5]
Convention Paper 9488 (Purchase now)

P5-4 Advances to a Frequency-Domain Parametric Coder of Wideband SpeechAníbal Ferreira, University of Porto - Porto, Portugal; Deepen Sinha, ATC Labs - Newark, NJ, USA
In recent years, tools in perceptual coding of high-quality audio have been tailored to capture highly detailed information regarding signal components so that they gained an intrinsic ability to represent audio parametrically. In a recent paper we described a first validation model to such an approach applied to parametric coding of wideband speech. In this paper we describe specific advances to such an approach that improve coding efficiency and signal quality. A special focus is devoted to the fact that transmission to the decoder of any phase information is avoided, and that direct synthesis in the time-domain of the periodic content of speech is allowed in order to cope with fast F0 changes. A few examples of signal coding and transformation illustrate the impact of those improvements.
Convention Paper 9509 (Purchase now)

P5-5 Visual Information Search in Digital Audio WorkstationsJoshua Mycroft, Queen Mary University London - London, UK; Tony Stockman, Queen Mary University London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
As the amount of visual information within Digital Audio Workstations increases, the interface potentially becomes more cluttered and time consuming to navigate. The increased graphical information may tax available display space requirements and potentially overload visual perceptual and attentional bandwidth. This study investigates the extent to which Dynamic Query filters (sliders, buttons, and other filters) can be used in audio mixing interfaces to improve both visual search times and concurrent critical listening tasks (identifying subtle attenuation of named instruments in a multichannel mix). The results of the study suggest that the inclusion of Dynamic Query filters results in a higher amount of correctly completed visual and aural tasks.
Convention Paper 9510 (Purchase now)

P5-6 AC-4 – The Next Generation Audio CodecKristofer Kjörling, Dolby Sweden AB - Stockholm, Sweden; Jonas Rödén, Dolby Sweden AB - Stockholm, Sweden; Martin Wolters, Dolby Germany GmbH - Nuremberg, Germany; Jeff Riedmiller, Dolby Laboratories - San Francisco, CA USA; Arijit Biswas, Dolby Germany GmbH - Nuremberg, Germany; Per Ekstrand, Dolby Sweden AB - Stockholm, Sweden; Alexander Gröschel, Dolby Germany GmbH - Nuremberg, Germany; Per Hedelin, Dolby Sweden AB - Stockholm, Sweden; Toni Hirvonen, Dolby Laboratories - Stockholm, Sweden; Holger Hörich, Dolby Germany GmbH - Nuremberg, Germany; Janusz Klejsa, Dolby Sweden AB - Stockholm, Sweden; Jeroen Koppens, Dolby Sweden AB - Stockholm, Sweden; K. Krauss, Dolby Germany GmbH - Nuremberg, Germany; Heidi-Maria Lehtonen, Dolby Sweden AB - Stockholm, Sweden; Karsten Linzmeier, Dolby Germany GmbH - Nuremberg, Germany; Hannes Muesch, Dolby Laboratories, Inc. - San Francisco, CA, USA; Harald Mundt, Dolby Germany GmbH - Nuremberg, Germany; Scott Norcross, Dolby Laboratories - San Francisco, CA, USA; J. Popp, Dolby Germany GmbH - Nuremberg, Germany; Heiko Purnhagen, Dolby Sweden AB - Stockholm, Sweden; Jonas Samuelsson, Dolby Sweden AB - Stockholm, Sweden; Michael Schug, Dolby Germany GmbH - Nuremberg, Germany; L. Sehlström, Dolby Sweden AB - Stockholm, Sweden; R. Thesing, Dolby Germany GmbH - Nuremberg, Germany; Lars Villemoes, Dolby Sweden - Stockholm, Sweden; Mark Vinton, Dolby - San Francisco, CA, USA
AC-4 is a state of the art audio codec standardized in ETSI (TS103 190 and TS103 190-2) and the TS103 190 is part of the DVB toolbox (TS101 154). AC-4 is an audio codec designed to address the current and future needs of video and audio entertainment services including broadcast and Internet streaming. As such, it incorporates a number of features beyond the traditional audio coding algorithms, such as capabilities to support immersive and personalized audio, support for advanced loudness management, video-frame synchronous coding, dialogue enhancement, etc. This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications. Also a lecture—see session P2-3]
Convention Paper 9491 (Purchase now)

P5-7 Single-Channel Audio Source Separation Using Deep Neural Network EnsemblesEmad M. Grais, University of Surrey - Guildford, Surrey, UK; Gerard Roma, University of Surrey - Guildford, Surrey, UK; Andrew J. R. Simpson, University of Surrey - Guildford, Surrey, UK; Mark D. Plumbley, University of Surrey - Guildford, Surrey, UK
Deep neural networks (DNNs) are often used to tackle the single channel source separation (SCSS) problem by predicting time-frequency masks. The predicted masks are then used to separate the sources from the mixed signal. Different types of masks produce separated sources with different levels of distortion and interference. Some types of masks produce separated sources with low distortion, while other masks produce low interference between the separated sources. In this paper a combination of different DNNs’ predictions (masks) is used for SCSS to achieve better quality of the separated sources than using each DNN individually. We train four different DNNs by minimizing four different cost functions to predict four different masks. The first and second DNNs are trained to approximate reference binary and soft masks. The third DNN is trained to predict a mask from the reference sources directly. The last DNN is trained similarly to the third DNN but with an additional discriminative constraint to maximize the differences between the estimated sources. Our experimental results show that combining the predictions of different DNNs achieves separated sources with better quality than using each DNN individually. also a lecture—see session P2-6]
Convention Paper 9494 (Purchase now)

P5-8 Using Phase Information to Improve the Reconstruction Accuracy in Sinusoidal ModelingClara Hollomey, Glasgow Caledonian University - Glasgow, Scotland, UK; David Moore, Glasgow Caledonian University - Glasgow, Lanarkshire, UK; Don Knox, Glasgow Caledonian University - Glasgow, Scotland, UK; W. Owen Brimijoin, MRC/CSO Institute of Hearing Research - Glasgow, Scotland, UK; William Whitmer, MRC/CSO Institute of Hearing Research - Glasgow, Scotland, UK
Sinusoidal modeling is one of the most common techniques for general purpose audio synthesis and analysis. Owing to the ever increasing amount of available computational resources, nowadays practically all types of sounds can be constructed up to a certain degree of perceptual accuracy. However, the method is computationally expensive and can for some cases, particularly for transient signals, still exceed the available computational resources. In this work methods derived from the realm of machine learning are exploited to provide a simple and efficient means to estimate the achievable reconstruction quality. The peculiarities of common classes of musical instruments are discussed and finally, the existing metrics are extended by information on the signal's phase propagation to allow for more accurate estimations. Also a lecture—see session P2-4]
Convention Paper 9492 (Purchase now)

P5-9 Just Noticeable Difference of Interaural Level Difference to Frequency and Interaural Level DifferenceHeng Wang, Wuhan Polytechnic University - Wuhan, Hubei, China; Cong Zhang, Wuhan Polytechnic University - Wuhan, Hubei, China; Yafei Wu, Wuhan University - Wuhan, Hubei, China
In order to explore the perceptual mechanism of Interaural Level Difference (ILD) and research the relationship of ILD limen to frequency and ILD, this article selected eight values of ILD according to the qualitative analysis of ILD sensitivity by human ear. It was divided into 24 frequency bands as critical band and selected the center frequency of each band to test. This experiment adopted the traditional test methods (1 up/2 down and 2AFC). The results showed that: the thresholds of ILD are more significant with frequency, they are smaller at 500 Hz and 4000 Hz, a maximum value especially when it reaches about 1000 Hz; the thresholds increase as the reference values of ILD increase. This work will provide basic data for comprehensive exploring perceptual characteristics of the human ear and theoretical support for audio efficient compression.
Convention Paper 9511 (Purchase now)


Return to Paper Sessions

AES - Audio Engineering Society