Meeting Topic: Spatial Audio — Reconstructing Reality or Creating Illusion?
Moderator Name: Kerry Haps
Speaker Name: Francis Rumsey
Meeting Location: Niles, IL (Shure SN theater)
Spatial Audio — Reconstructing Reality or Creating Illusion? (YouTube Video)
The May 21, 2014, meeting of the Chicago AES Section was held at Shure Incorporated, located in Niles, Illinois. Twenty members and fourteen non-members attended the meeting which included Francis Rumsey's presentation entitled, Spatial Audio — Reconstructing Reality or Creating Illusion?
If, after reading these highlights, you find yourself wanting more - rest assured that the video documentation of Francis Rumsey's presentation is now posted on YouTube: https://www.youtube.com/watch?v=y82nth2Pnwk
Francis Rumsey is an independent technical writer and consultant, based in the UK. Until 2009 he was Professor and Director of Research at the Institute of Sound Recording, University of Surrey, specializing in sound quality, psychoacoustics, and spatial audio. He is currently chair of the AES Technical Council, Consultant Technical Writer and Editor for the AES Journal.
Knowing that he was going to be in Chicago in May and being a consummate AES member, Francis contacted our section committee and proactively offered to provide a repeat of the presentation that he had recently given at the 136th International AES Conference in Germany.
After Bob Schulein introduced him as "knowing a lot about a lot of things," Francis began his presentation by indicating that for over 100 years, a battlefield has existed between those that want sound to be an accurate reconstruction and those that want it to simply sound nice. Only 5 percent of the recordings sold in the commercial market are actually made in a 'natural space'. Hence, 95 percent are 'manufactured' in the studio.
Francis reviewed several topics, including:
• Why two channel stereo should sound terrible but it doesn't and he asked why we are all focused on stereo localization of the phantom image;
• Timbral Fidelity (is crucial as it makes up 78 percent of sound quality) vs. Frontal Spatial Fidelity vs. Surround Spatial Fidelity (immersion);
• Reflections of 'natural' versus 'reproduced' sounds
o Instruments not only have their own unique sounds but their sound reflections are unique;
o If you use a loudspeaker to recreate an instrument's sound, the reflections are from the loudspeaker and not the instrument;
• Are more loudspeakers the answer...possibly not (it's only an approximation);
• 3D or Immersive Sound Systems currently have no guiding principle of how to generate or pan the system or signal;
• The Soundfield Synthesis approach (or the scientific approach) is
o Often in the horizontal plane;
o Limited by spatial aliasing at high frequencies;
o Explicit for loudspeaker driver functions;
• Huygen's Principle — Wave Field Synthesis (WFS)
o Focused sources creating a 'holographic source' out of the middle of the room;
o Leads us to believe that more speakers and alignment does NOT produce a preferable listening experience (keying on naturalness and fidelity
• Offers that the 'uncanny valley' used to describe what appears natural versus manufactured in visual or computer-generated graphic designs may possibly apply to the audio ... natural versus manufactured.
o We may get to a point on the curve of Mori's Graph (http://en.wikipedia.org/wiki/Uncanny_valley) where it gets close to sounding or being 'real' but it is also somewhat 'freaky';
o It's almost like we use different critical faculties to compare or review our perception depending on if we can relate the sound to a real or un-natural sound (from our experience)
Francis then offered our group some 'wise advice': Context is king; timbral quality trumps spatial quality; we need a universal spatial encoding format; Keep it Simple; and perfect sound reconstruction over a wide area is impossible.
When the audience asked questions, Francis was able to respond with the following feedback/observations:
• With synthesized voice and audio, we are trying to reproduce the perfect sound field. Our brain can hear the perfect audio but if we cannot place the environment or context or cannot relate to it with any direct experience, then we're 'confused'. Without a full sensory environment, reference, or context — we can be easily lost;
• It's difficult to say if binaural recordings are ideal. It's a difficult and complex answer. The concept is promising but it has yet to hold up. They should be ideal but they still have short comings:
• Wave Field Synthesis (WFS) — can be good but there are many bad examples. It is difficult with spatial aliasing and frequency limitations;
• When asked what his 2 channel speaker design would be, Francis reminded us that it is important to prioritize timbral over spatial and that he would choose 5.1 or 2.0.
The Chicago AES Section would like to extend a special thank you to Francis Rumsey for taking time out of his visit to Chicago to present to our section.
Written By: Ken Platz