You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
This study is concerned with listener’s natural ability to identify an anonymous speaker’s gender and emotion from voice alone. We attempt to map psychological characteristics of the speaker, such as gender image and emotion, to acoustical properties. The acoustical parameters of voice samples were pitch (mean, maximum, and minimum), pitch variation over time, jitter, shimmer, and Harmonics-to-Noise Ratio (HNR). Participants listened to 2-second voice clips and were asked to rate each voice’s gender image and emotion using a 7-point scale. Emotional responses were obtained for 7 opposite pairs of affective attributes (Goble and Ni Chasaide, 2003). The pairs of affective attributes were relaxed/stressed, content/angry, friendly/hostile, sad/happy, bored/interested, intimate/formal, and timid/confident. Experimental results show that listeners were able to identify voice gender and assess emotional status from short utterances. Statistical analyses revealed that these acoustic parameters were related to listeners’ perception of a voice’s gender image and its affective attributes. For voice gender perception, there were significant correlations with jitter, shimmer, and HNR parameters in addition to pitch parameters. For perception of affective attributes, acoustic parameters were analyzed with respect to the valence-arousal dimension. Voices perceived as positive tended to have higher variance in pitch and higher maximum pitch than those perceived as negative. Voices perceived as strongly active tended to have higher number of voice breaks, jitter, shimmer, and lower HNR than those perceived as passive. We expect that our experimental results on mapping acoustical parameters with voice gender and emotion perception could be applied to the field of Artificial Intelligence (AI) when assigning specific tone or quality to voice agents. Moreover, such psycho-acoustical mapping can improve the naturalness of synthesized speech, especially neural TTS (Text-To-Speech), because it can assist in selecting the appropriate speech database for voice interaction and for situations where certain voice gender and affective expressions are needed.
Author (s): Oh, Eunmi;
Lee, Jaeeun;
Lee, Dayoung;
Affiliation:
Yonsei University, Seoul, Korea
(See document for exact affiliation information.)
AES Convention: 150
Paper Number:10461
Publication Date:
2021-05-06
Session subject:
Psychology
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Oh, Eunmi; Lee, Jaeeun; Lee, Dayoung; 2021; Mapping voice gender and emotion to acoustic properties of natural speech [PDF]; Yonsei University, Seoul, Korea; Paper 10461; Available from: https://aes.org/publications/elibrary-page/?id=21054
Oh, Eunmi; Lee, Jaeeun; Lee, Dayoung; Mapping voice gender and emotion to acoustic properties of natural speech [PDF]; Yonsei University, Seoul, Korea; Paper 10461; 2021 Available: https://aes.org/publications/elibrary-page/?id=21054
@inproceedings{Oh2021mapping,
title={{Mapping voice gender and emotion to acoustic properties of natural speech}},
author={Oh, Eunmi and Lee, Jaeeun and Lee, Dayoung},
year={2021},
month={may},
booktitle={Journal of the Audio Engineering Society},
publisher={Paper 10461; AES Convention 150; May 2021},
number={10461},
organization={AES},
}
Notifications