You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Speaker diarization remains a field with potential for improvement. In meeting scenarios, the task of labeling audio with the corresponding speaker identities, can be further assisted with the exploitation of spatial features. In the present work, a framework is designed, to evaluate the combination of speaker embeddings with Time Difference Of Arrival (TDOA) values. Speaker embeddings are extracted using two popular pre-trained models, ECAPA – TDNN and Xvectors. TDOA values for every speech segment are calculated using the Generalized Cross Correlation (GCC) method with phase transform (PHAT) weights (GCC – PHAT). The outputs of GCC – PHAT and deep neural network (DNN) systems are fused by concatenation and used as the input to spectral clustering. The objective of the proposed framework is to evaluate the potential of exploiting available microphone arrays in meetings and the investigation of complementary information between TDOA and speaker embeddings. The system is evaluated on two different datasets, the AVLab Speaker Localization and a multichannel dataset created in the context of the present work. Furthermore, an additional dataset using mobile phones embedded microphones is created and openly distributed to assist research groups to find solutions to complex problems such as speaker localization and diarization with arbitrary arrays comprising microphones of different characteristics and quality.
Author (s): Xylogiannis, Paris;
Vryzas, Nikolaos;
Bountourakis, Vasileios;
Dimoulas, Charalampos;
Affiliation:
Aristotle University of Thessaloniki, Greece; Aristotle University of Thessaloniki, Greece; Aalto University, Espoo, Finland; Aristotle University of Thessaloniki, Greece
(See document for exact affiliation information.)
AES Convention: 154
Paper Number:89
Publication Date:
2023-05-06
Session subject:
Spatial Audio
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Xylogiannis, Paris; Vryzas, Nikolaos; Bountourakis, Vasileios; Dimoulas, Charalampos; 2023; Multichannel speaker diarization with arbitrary microphone arrays [PDF]; Aristotle University of Thessaloniki, Greece; Aristotle University of Thessaloniki, Greece; Aalto University, Espoo, Finland; Aristotle University of Thessaloniki, Greece; Paper 89; Available from: https://aes.org/publications/elibrary-page/?id=22114
Xylogiannis, Paris; Vryzas, Nikolaos; Bountourakis, Vasileios; Dimoulas, Charalampos; Multichannel speaker diarization with arbitrary microphone arrays [PDF]; Aristotle University of Thessaloniki, Greece; Aristotle University of Thessaloniki, Greece; Aalto University, Espoo, Finland; Aristotle University of Thessaloniki, Greece; Paper 89; 2023 Available: https://aes.org/publications/elibrary-page/?id=22114
@inproceedings{Xylogiannis2023multichannel,
title={{Multichannel speaker diarization with arbitrary microphone arrays}},
author={Xylogiannis, Paris and Vryzas, Nikolaos and Bountourakis, Vasileios and Dimoulas, Charalampos},
year={2023},
month={may},
booktitle={Journal of the Audio Engineering Society},
publisher={Express Paper 89; AES Convention 154; May 2023},
number={89},
organization={AES},
}
Notifications