AES New York 2013
Poster Session P12
Friday, October 18, 3:00 pm — 4:30 pm (1EFoyer)
Poster: P12 - Signal Processing
P12-1 Temporal Synchronization for Audio Watermarking Using Reference Patterns in the Time-Frequency Domain—Tobias Bliem, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Juliane Borsum, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Giovanni Del Galdo, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany; Stefan Krägeloh, Fraunhofer Institute for Integrated Circuits IIS - Erlangen, Germany
Temporal synchronization is an important part of any audio watermarking system that involves an analog audio signal transmission. We propose a synchronization method based on the insertion of two-dimensional reference patterns in the time-frequency domain. The synchronization patterns consist of a combination of orthogonal sequences and are continuously embedded along with the transmitted data, so that the information capacity of the watermark is not affected. We investigate the relation between synchronization robustness and payload robustness and show that the length of the synchronization pattern can be used to tune a trade-off between synchronization robustness and the probability of false positive watermark decodings. Interpreting the two-dimensional binary patterns as one-dimensional N-ary sequences, we derive a bond for the autocorrelation properties of these sequences to facilitate an exhaustive search for good patterns.
Convention Paper 8975 (Purchase now)
P12-2 Sound Source Separation Using Interaural Intensity Difference in Real Environments—Chan Jun Chun, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper, a sound source separation method is proposed by using the interaural intensity difference (IID) of stereo audio signal recorded in real environments. First, in order to improve the channel separability, a minimum variance distortionless response (MVDR) beamformer is employed to increase the intensity difference between stereo channels. Then, IID between stereo channels processed by the beamformer is computed and applied to sound source separation. The performance of the proposed sound source separation method is evaluated on the stereo audio source separation evaluation campaign (SASSEC) measures. It is shown from the evaluation that the proposed method outperforms a sound source separation method without applying a beamformer.
Convention Paper 8976 (Purchase now)
P12-3 Reverberation and Dereverberation Effect on Byzantine Chants—Alexandros Tsilfidis, accusonus, Patras Innovation Hub - Patras, Greece; Charalampos Papadakos, University of Patras - Patras, Greece; Elias Kokkinis, accusonus - Patras, Greece; Georgios Chryssochoidis, National and Kapodistrian University of Athens - Athens, Greece; Dimitrios Delviniotis, National and Kapodistrian University of Athens - Athens, Greece; Georgios Kouroupetroglou, National and Kapodistrian University of Athens - Athens, Greece; John Mourjopoulos, University of Patras - Patras, Greece
Byzantine music is typically monophonic and is characterized by (i) prolonged music phrases and (ii) Byzantine scales that often contain intervals smaller than the Western semitone. As happens with most religious music genres, reverberation is a key element of Byzantine music. Byzantine churches/cathedrals are usually characterized by particularly diffuse fields and very long Reverberation Time (RT) values. In the first part of this work, the perceptual effect of long reverberation on Byzantine music excerpts is investigated. Then, a case where Byzantine music is recorded in non-ideal acoustic conditions is considered. In such scenarios, a sound engineer might require to add artificial reverb on the recordings. Here it is suggested that the step of adding extra reverberation can be preceded by a dereverberation processing to suppress the originally recorded non ideal reverberation. Therefore, in the second part of the paper a subjective test is presented that evaluates the above sound engineering scenario.
Convention Paper 8977 (Purchase now)
P12-4 Cepstrum-Based Preprocessing for Howling Detection in Speech Applications—Renhua Peng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Jian Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaoliang Chen, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
Conventional howling detection algorithms exhibit dramatic performance degradations in the presence of harmonic components of speech that have the similar properties with the howling components. To solve this problem, this paper proposes a cepstrum preprocessing-based howling detection algorithm. First, the impact of howling components on cepstral coefficients is studied in both theory and simulation. Second, according to the theoretical results, the cepstrum pre-processing-based howling detection algorithm is proposed. The Receiver Operating Characteristic (ROC) simulation results indicate that the proposed algorithm can increase the detection probability at the same false alarm rate. Objective measurements, such as Speech Distortion (SD) and Maximum Stable Gain (MSG), further confirm the validity of the proposed algorithm.
Convention Paper 8978 (Purchase now)
P12-5 Delayless Method to Suppress Transient Noise Using Speech Properties and Spectral Coherence—Chengshi Zheng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaoliang Chen, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Shiwei Wang, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Renhua Peng, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China; Xiaodong Li, Chinese Academy of Sciences - Beijing, China; Chinese Academy of Sciences - Shanghai, China
This paper proposes a novel delayless transient noise reduction method that is based on speech properties and spectral coherence. The proposed method has three stages. First, the transient noise components are detected in each subband by using energy-normalized variance. Second, we apply the harmonic property of the voiced speech and the continuity of the speech signal to reduce speech distortion in voiced speech segments. Third, we define a new spectral coherence to distinguish the unvoiced speech from the transient noise to avoid suppressing the unvoiced speech. Compared with those existing methods, the proposed method is computationally efficient and casual. Experimental results show that the proposed algorithm can effectively suppress transient noise up to 30 dB without introducing audible speech distortion.
Convention Paper 8979 (Purchase now)
P12-6 Artificial Stereo Extension Based on Hidden Markov Model for the Incorporation of Non-Stationary Energy Trajectory—Nam In Park, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Kwang Myung Jeon, Gwangju Institute of Science and Technology (GIST) - Gwangju, Korea; Seung Ho Choi, Prof., Seoul National University of Science and Technology - Seoul, Korea; Hong Kook Kim, Gwangju Institute of Science and Tech (GIST) - Gwangju, Korea
In this paper an artificial stereo extension method is proposed to provide stereophonic sound from mono sound. While frame-independent artificial stereo extension methods, such as Gaussian mixture model (GMM)-based extension, do not consider the correlation of energies of previous frames, the proposed stereo extension method employs a minimum mean-squared error estimator based on a hidden Markov model (HMM) for the incorporation of non-stationary energy trajectory. The performance of the proposed stereo extension method is evaluated by a multiple stimuli with a hidden reference and anchor (MUSHRA) test. It is shown from the statistical analysis of the MUSHRA test results that the stereo signals extended by the proposed stereo extension method have significantly better quality than those of a GMM-based stereo extension method.
Convention Paper 8980 (Purchase now)
P12-7 Simulation of an Analog Circuit of a Wah Pedal: A Port-Hamiltonian Approach—Antoine Falaize-Skrzek, IRCAM - Paris, France; Thomas Hélie, IRCAM-CNRS UMR 9912-UPMC - Paris, France
Several methods are available to simulate electronic circuits. However, for nonlinear circuits, the stability guarantee is not straightforward. In this paper the approach of the so-called "Port-Hamiltonian Systems" (PHS) is considered. This framework naturally preserves the energetic behavior of elementary components and the power exchanges between them. This guarantees the passivity of the (source-free part of the) circuit.
Convention Paper 8981 (Purchase now)
P12-8 Improvement in Parametric High-Band Audio Coding by Controlling Temporal Envelope with Phase Parameter—Kijun Kim, Kwangwoon University - Seoul, Korea; Kihyun Choo, Samsung Electronics Co., Ltd. - Suwon, Korea; Eunmi Oh, Samsung Electronics Co., Ltd. - Suwon, Korea; Hochong Park, Kwangwoon University - Seoul, Korea
This study proposes a method to improve temporal envelope control in parametric high-band audio coding. Conventional parametric high-band coders may have difficulties with controlling fine high-band temporal envelope, which can cause the deterioration in sound quality for certain audio signals. In this study a novel method is designed to control temporal envelope using spectral phase as an additional parameter. The objective and the subjective evaluations suggest that the proposed method should improve the quality of sound with severely degraded temporal envelope by the conventional method.
Convention Paper 8982 (Purchase now)