AES San Francisco 2010
Poster Session P19
P19 - Spatial Sound Processing—1
Saturday, November 6, 2:30 pm — 4:00 pm (Room 226)
P19-1 Estimation of the Probability Density Function of the Interaural Level Differences for Binaural Speech Separation—David Ayllon, Roberto Gil-Pita, Manuel Rosa-Zurera, University of Alcalá - Alcalá de Henares (Madrid), Spain
Source separation techniques are applied to audio signals to separate several sources from one mixture. One important challenge of speech processing is noise suppression and several methods have been proposed. However, in some applications like hearing aids, we are not interested just in removing noise from speech but amplifying speech and attenuating noise. A novel method based on the estimation of the Power Density Function of the Interaural Level Differences in conjunction with time-frequency decomposition and binary masking is applied to speech-noise mixtures in order to obtain both signals separately. Results show how both signals are clearly separated and the method entails low computational cost so it could be implemented in a real-time environment, such as a hearing aid device.
Convention Paper 8273 (Purchase now)
P19-2 The Learning Effect of HRTF-Based 3-D Sound Perception with a Horizontally Arranged 8-Loudspeaker System—Akira Saji, Keita Tanno, Li Huakang, Tetsuya Watanabe, Jie Huang, The University of Aizu - Aizuwakamatsu City, Fukushima, Japan
This paper argues about the learning effects on the localization of HRTF-based 3-D sound using an 8-channel loudspeaker system, which creates virtual sound images. This system can realize sound with elevation by 8 channel loudspeakers arranged on the horizontal plane and convolving HRTF, not using high or low mounted loudspeakers. The position of the sound image that the system creates is difficult to perceive because such HRTF-based sounds are unfamiliar. However, after repetition of the learning process, almost all listeners can perceive the position of the sound images better. This paper shows this learning effect for an HRTF-based 3-D sound system.
Convention Paper 8274 (Purchase now)
P19-3 Spatial Audio Attention Model Based Surveillance Event Detection—Bo Hang, Ruimin Hu, Xiaochen Wang, Weiping Tu, Wuhan University - Wuhan, China
In this paper we propose a bottom-up audio attention model based on spatial audio cues and subband energy change for unsupervised event detection in stereo audio surveillance. First, the spatial audio parameter Interaural Level Difference (ILD) is extracted to calculate and represent the attention events, which are caused by rapid moving sound source. Then the subband energy change is computed to present the salient energy distribution change in frequency domain. At last, an environment adaptive normalization is used to assess the normalized attention level. Experimental results demonstrate that the proposed audio attention model is effective for audio surveillance event detection.
Convention Paper 8275 (Purchase now)
P19-4 Investigating Perceptual Effects Associated with Vertically Extended Sound Fields Using Virtual Ceiling Speaker—Yusuke Ono, Sungyoung Kim, Masahiro Ikeda, Yamaha Corporation - Iwata, Shizuoka, Japan
Virtual Ceiling Speaker (VCS) is a signal processing method that creates an elevated auditory image using an optimized cross-talk compensation for a 5-channel reproduction system. In order to understand latent perceptual effects caused by virtually elevated sound imageries, we experimentally compared the perceptual differences between physically and virtually elevated sound sources in terms of ASW, LEV, Powerfulness, and Clarity. The results showed that listeners perceived higher LEV or Clarity by adding physically or virtually elevated early reflections than 5-channel content in either case. It might implicate that attributes related to spatial dimensions were relatively well expressed due to virtually elevated signals using VCS.
Convention Paper 8276 (Purchase now)
P19-5 Enhancing 3-D Audio Using Blind Bandwidth Extension—Tim Habigt, Marko Durkovic, Martin Rothbucher, Klaus Diepold, Technische Universität München - München, Germany
Blind bandwidth extension techniques are used to recreate high frequency bands of a narrowband audio signal. These methods allow increasing the perceived quality of signals that are transmitted via a narrow frequency band as in telephone or radio communication systems. We evaluate the possibility to use blind bandwidth extension methods in 3-D audio applications, where high frequency components are necessary to create an impression of elevated sound sources.
Convention Paper 8277 (Purchase now)