AES E-Library

The Influence of Cross-Modal Interaction on Audio-Visual Speech Quality Perception

Combined audio-visual services are expected to become a feature of the next generation of telecommunication systems. Current systems apply individual perceptually motivated data reduction to the audio and video. A system which applies perceptually motivated data reduction to the combined audio-visual content may provide greater data reduction, whilst maintaining high quality service over low bit-rate transmission media. To be able to design such bimodal codecs, and optimize overall performance, (through the use of low bit-rate codecs),it is important to understand the trade-offs between the quality of the audio and the video. This paper compares the results of a complimentary pair of subjective experiments designed to investigate the differences between visual speech and non-visual speech quality perception and quality mismatch. Conclusions are drawn about the variation in cross-modal interaction, and particularly speech quality perception, for different audio-visual content. The results obtained will be used in the design of a bi-model perceptual model.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Session subject:

DOI:


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
16938
Choose your country of residence from this list: