AES San Francisco 2010
Poster Session P8
Friday, November 5, 9:30 am — 11:00 am (Room 226)
Poster: P8 - Audio Processing—1
P8-1 Near and Far-Field Control of Focused Sound Radiation Using a Loudspeaker Array—Sangchul Ko, Youngtae Kim, Jung-Woo Choi, SAIT, Samsung Electronics Co. Ltd. - Gyeonggi-do, Korea
In this paper a sound manipulation technique is proposed to prevent unwanted eavesdropping or disturbing others in the vicinity if a multimedia device is being used in a public place. This is capable of realizing the creation of a spatial region having highly acoustic potential energy at the listener’s position. For doing so, the paper discusses the design of multichannel filters with a spatial directivity pattern for a given arbitrary loudspeaker array configuration. First some limitations in using conventional beamforming techniques are presented, and then a novel control strategy is suggested for reproducing a desired acoustic property in a spatial area of interest close to the loudspeaker array. This technique also allows us to control an acoustic property in an area relatively far from the array with a single objective function. In order to precisely produce a desired shape of energy distribution in both areas, spatial weighting technique is introduced. The results are compared with those from controlling each area separately.
Convention Paper 8198 (Purchase now)
P8-2 A Real-Time implementation of a Novel Psychoacoustic Approach for Stereo Acoustic Echo Cancellation—Stefania Cecchi, Laura Romoli, Paolo Peretti, Francesco Piazza, Università Politecnica delle Marche - Ancona (AN), Italy
Stereo acoustic echo cancellers (SAECs) are used in teleconferencing systems to reduce undesired echoes originating from coupling between loudspeakers and microphones. The main problem of this approach is related to the issue of uniquely identifying each pair of room acoustic paths, due to high interchannel coherence. In this paper a real-time implementation of a novel approach for SAEC based on the psychoacoustic effect of missing fundamental is proposed. An adaptive algorithm is employed to track and remove the fundamental frequency of one of the two channels, ensuring a continuous decorrelation without affecting the stereo quality. Several tests are presented taking into account a real-time implementation on a DSP framework in order to confirm its effectiveness.
Convention Paper 8199 (Purchase now)
P8-3 Solo Plucked String Sound Detection by the Energy-to-Spectral Flux Ratio (ESFR)—Byung Suk Lee, LG Electronics Inc. - Seocho-Gu, Seoul, Korea, Columbia University, New York, NY, USA; Chang-Heon Lee, Yonsei University - Seoul, Korea; Gyuhyeok Jeong, In Gyu Kang, LG Electronics Inc. - Seocho-Gu, Seoul, Korea
We address the problem of distinguishing solo plucked string sound from speech. Due to the harmonic components present in both types of signals, a low complexity music/speech classifier often misclassifies these signals. To capture the sustained harmonic structures observed in solo plucked string sound, we propose a new feature, the Energy-to-Spectral Flux Ratio (ESFR). The values and the statistics of the ESFR for solo plucked string sound were distinct from those for speech when calculated over windows of 20 to 50 ms. By building a low complexity detector with the ESFR, we demonstrate the discriminating performance of the ESFR feature for the considered problem.
Convention Paper 8200 (Purchase now)
P8-4 Separation of Repeating and Varying Components in Audio Mixtures—Sean Coffin, Stanford University - Stanford, CA, USA
A large amount of modern pop music contains digital “loops” or “samples” (short audio clips) that appear multiple times during a song. In this paper a novel approach to separating these exactly repeating component waveforms from the rest of an audio mixture is presented. By examining time-frequency representations of the mixture during several instances of a single repeating component and taking the complex value for each time-frequency bin with the smallest magnitude across all instances we can effectively extract the content that is perceived to be repeating given that the rest of the mixture varies sufficiently. Results are presented demonstrating successful application to commercially available recordings as well as to constructed audio mixtures achieving signal to interference ratios up to 42.8 dB.
Convention Paper 8201 (Purchase now)
P8-5 High Quality Time-Domain Pitch Shifting Using PSOLA and Transient Preservation—Adrian von dem Knesebeck, Pooya Ziraksaz, Udo Zölzer, Helmut-Schmidt-University - Hamburg, Germany
An enhanced pitch shifting system is presented that uses the Pitch Synchronous Overlap Add (PSOLA) technique and a transient detection for processing of monophonic speech or instrument signals. The PSOLA algorithm requires the pitch information and the pitch marks for the signal segmentation in the analysis stage. The pitch is acquired using a well established pitch detector. A new robust pitch mark positioning algorithm is presented that achieves high quality results and allows the positioning of the pitch marks in a frame-based manner to enable real-time application. The quality of the pitch shifter is furthermore enhanced by extracting the transient components before the PSOLA and reapplying them at the synthesis stage to eliminate repetitions of the transients.
Convention Paper 8202 (Purchase now)