AES New York 2015
Poster Session P18

P18 - Recording & Production

Saturday, October 31, 4:15 pm — 5:45 pm (S-Foyer 1)

P18-1 The Impact of Subgrouping Practices on the Perception of Multitrack Music Mixes—David Ronan, Queen Mary University of London - London, UK; Brecht De Man, Queen Mary University of London - London, UK; Hatice Gunes, Queen Mary University of London - London, UK; Joshua D. Reiss, Queen Mary University of London - London, UK
Subgrouping is an important part of the mix engineering workflow that facilitates the process of manipulating a number of audio tracks simultaneously. We statistically analyze the subgrouping practices of mix engineers in order to establish the relationship between subgrouping and mix preference. We investigate the number of subgroups (relative and absolute), the type of audio processing, and the subgrouping strategy in 72 mixes of 9 songs, by 16 mix engineers. We analyze the subgrouping setup for each mix of a particular song and also each mix by a particular mixing engineer. We show that subjective preference for a mix strongly correlates with the number of subgroups and, to a lesser extent, which types of audio processing are applied to the subgroups.
Convention Paper 9442 (Purchase now)

P18-2 MixViz: A Tool to Visualize Masking in Audio Mixes—Jon Ford, Northwestern University - Evanston, IL, USA; Mark Cartwright, Northwestern University - Evanston, IL, USA; Bryan Pardo, Northwestern University - Evanston, IL, USA
This paper presents MixViz, a real-time audio production tool that helps users visually detect and eliminate masking in audio mixes. This work adapts the Glasberg and Moore time-varying Model of Loudness and Partial Loudness to analyze multiple audio tracks for instances of masking. We extend the Glasberg and Moore model to allow it to account for spatial release from masking effects. Each audio track is assigned a hue and visualized in a 2-dimensional display where the horizontal dimension is spatial location (left to right) and the vertical dimension is frequency. Masking between tracks is indicated via a change of color. The user can quickly drag and drop tracks into and out of the mix visualization to observe the effects on masking. This lets the user intuitively see which tracks are masked in which frequency ranges and take action accordingly. This tool has the potential to both make mixing easier for novices and improve the efficiency of expert mixers.
Convention Paper 9443 (Purchase now)

P18-3 Sound Capture Technical Parameters of Colombian Folk Music Instruments for Virtual Sound Banks Use—Carlos Andrés Caballero Parra, Instituto Tecnológico Metropolitano - Medellín, Antioquia, Colombia; Jamir Mauricio Moreno Espinal, Sr., Instituto Tecnológico Metropolitano - Medellín, Antioquia, Colombia
This paper describes the appropriate and correct way of dealing with the technical conceptualizations required for the digital sound capture of Colombian folk music instruments, taking into account the particular parameters of each instrument and the current audio file formats used in virtual sound banks. This paper does not pose either new capture techniques or microphone placements. Instead, the task carried out herein uses well known methods in order to get precise and clear audio takes that will allow a significant number of audio samples for the configuration of sound banks that can be used in music software and also as virtual instruments. The different tests and analysis carried out showed that a broad sound capture is required (covering the overall instrument range), using plain frequency response microphones, with high-resolution digital conversion formats (96 kHz/24 bits), and near and distant stereo recordings, all these in acoustically-controlled and well-conditioned ambiences.
Convention Paper 9444 (Purchase now)

P18-4 Vocal Clarity in the Mix: Techniques to Improve the Intelligibility of Vocals—Yuval Ronen, New York University - New York, NY, USA
From interviewing leading professional mixing engineers and from research of known literature in the field common mixing techniques to improve the intelligibility of vocals were gathered. An experiment to test these techniques has been conducted on randomly selected participants with normal hearing and with no mixing or recording expertise. The results showed statistically significant differences between processed audio clips using these techniques versus unprocessed audio clips. To the author’s knowledge, this is the first study of its kind, which proved that certain common mixing techniques statistically improve intelligibility of vocals in popular music as perceived by human subjects.
Convention Paper 9445 (Purchase now)

P18-5 Affective Potential in Vocal Production—Duncan Williams, University of Plymouth - Devon, UK
The study of affect in music psychology – broadly construed as emotional responses communicated to, or induced in, the listener – increasingly concludes that voice processing can provide a powerful vector for emotional communication in the music production chain. The audio engineer has the ability to create a “definitive article” in the studio that gives listeners an opportunity to engage with the recorded voice in a manner that is quite distinct from everyday speech or the effect that might be achieved in a typical live performance. This paper examines the affective potential of the voice in a number of examples from popular music where the production chain has been exploited to provide a technological mediation to the listener’s emotional response.
Convention Paper 9446 (Purchase now)

P18-6 Sample-Rate Variance across Portable Digital Audio Recorders—Robert Oldfield, University of Salford - Salford, Greater Manchester, UK; Paul Kendrick, University of Salford - Salford, UK
In recent years there has been an increase in the use of portable digital recording devices such as, smart phones, tablets, dictaphones, and other portable hand-held recorders for making informal or in-situ recordings. Often it is not possible to connect a clocking signal to these devices as such recordings are affected by the deviations of the actual clocking rate of the device from the expected rate. This variation causes problems in the synchronization of signals from multiple recording devices and can prevent the use of some signal processing algorithms. This paper presents a novel methodology for determining the actual clock rate of digital recording devices based upon optimizing the correlation between a recording and a ground truth signal with varying degrees of temporal stretching. The paper further discusses the effects of sample frequency variation on typical applications. The sampling rates of a range of commonly used mobile audio recording devices was found to deviate from the nominal 48 kHz, with a standard deviation of 0.8172 Hz. The standard deviation of sampling rates for a single device type, used for long term logging of bio-acoustic signals, was found to be 0.1983 Hz (at a sampling rate of 48 kHz).
Convention Paper 9470 (Purchase now)

P18-7 Comparison of Loudness Features for Automatic Level Adjustment in Mixing—Gordon Wichern, iZotope, Inc. - Cambridge, MA, USA; Aaron Wishnick, iZotope - Cambridge, MA, USA; Alexey Lukin, iZotope, Inc. - Cambridge, MA, USA; Hannah Robertson, iZotope - Cambridge, MA, USA
Manually setting the level of each track of a multitrack recording is often the first step in the mixing process. In order to automate this process, loudness features are computed for each track and gains are algorithmically adjusted to achieve target loudness values. In this paper we first examine human mixes from a multitrack dataset to determine instrument-dependent target loudness templates. We then use these templates to develop three different automatic level-based mixing algorithms. The first is based on a simple energy-based loudness model, the second uses a more sophisticated psychoacoustic model, and the third incorporates masking effects into the psychoacoustic model. The three automatic mixing approaches are compared to human mixes using a subjective listening test. Results show that subjects preferred the automatic mixes created from the simple energy-based model, indicating that the complex psychoacoustic model may not be necessary in an automated level setting application.
Convention Paper 9370 (Purchase now)

Return to Paper Sessions

Audio Engineering Society

AES New York 2015Poster Session P18

P18 - Recording & Production

AES New York 2015
Poster Session P18