You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
In this paper, we propose a novel method for real-time speech emotion recognition (SER) tailored for human-robot interaction. Traditional SER techniques, which analyze entire utterances, often struggle in real-time scenarios due to their high latency. To overcome this challenge, the proposed method breaks down speech into short, overlapping segments and uses a soft voting mechanism to aggregate emotion probabilities in real time. The proposed real-time method is applied to an SER model comprising the pre-trained wav2vec 2.0 and a convolutional network for feature extraction and emotion classification, respectively. The performance of the proposed method was evaluated on the KEMDy19 dataset, a Korean emotion dataset focusing on four key emotions: anger, happiness, neutrality, and sadness. Consequently, applying the real-time method, which processed each segment with a duration of 0.5 or 3.0 seconds, resulted in relative reduction of unweighted accuracy by 10.61% or 5.08%, respectively, compared to the method that processed entire utterances. However, the real-time factor (RTF) was significantly improved.
Author (s): Jun, Jimin;
Kim, Hong Kook;
Affiliation:
School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST); AI Graduate School, Gwangju Institute of Science and Technology (GIST)
(See document for exact affiliation information.)
AES Convention: 157
Paper Number:288
Publication Date:
2024-09-27
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Jun, Jimin; Kim, Hong Kook; 2024; Real-time Speech Emotion Recognition for Human-robot Interaction [PDF]; School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST); AI Graduate School, Gwangju Institute of Science and Technology (GIST); Paper 288; Available from: https://aes.org/publications/elibrary-page/?id=22745
Jun, Jimin; Kim, Hong Kook; Real-time Speech Emotion Recognition for Human-robot Interaction [PDF]; School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST); AI Graduate School, Gwangju Institute of Science and Technology (GIST); Paper 288; 2024 Available: https://aes.org/publications/elibrary-page/?id=22745
Notifications