Multichannel speaker diarization with arbitrary microphone arrays

Xylogiannis, Paris and Vryzas, Nikolaos and Bountourakis, Vasileios and Dimoulas, Charalampos

AES E-Library

Multichannel speaker diarization with arbitrary microphone arrays

Speaker diarization remains a field with potential for improvement. In meeting scenarios, the task of labeling audio with the corresponding speaker identities, can be further assisted with the exploitation of spatial features. In the present work, a framework is designed, to evaluate the combination of speaker embeddings with Time Difference Of Arrival (TDOA) values. Speaker embeddings are extracted using two popular pre-trained models, ECAPA – TDNN and Xvectors. TDOA values for every speech segment are calculated using the Generalized Cross Correlation (GCC) method with phase transform (PHAT) weights (GCC – PHAT). The outputs of GCC – PHAT and deep neural network (DNN) systems are fused by concatenation and used as the input to spectral clustering. The objective of the proposed framework is to evaluate the potential of exploiting available microphone arrays in meetings and the investigation of complementary information between TDOA and speaker embeddings. The system is evaluated on two different datasets, the AVLab Speaker Localization and a multichannel dataset created in the context of the present work. Furthermore, an additional dataset using mobile phones embedded microphones is created and openly distributed to assist research groups to find solutions to complex problems such as speaker localization and diarization with arbitrary arrays comprising microphones of different characteristics and quality.

Author (s): Xylogiannis, Paris; Vryzas, Nikolaos; Bountourakis, Vasileios; Dimoulas, Charalampos;
Affiliation: Aristotle University of Thessaloniki, Greece; Aristotle University of Thessaloniki, Greece; Aalto University, Espoo, Finland; Aristotle University of Thessaloniki, Greece (See document for exact affiliation information.)
AES Convention: 154 Paper Number:89
Publication Date: 2023-05-06
Session subject: Spatial Audio

DOI:

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Express Paper

AES Conventions

AES Conferences

AES Training & Development

Gift Membership

AES Membership Benefits

Gift Membership

AES Membership Benefits

Become a Sustaining Member

AES Membership Benefits

AES Inside Track

Journal of the AES

AES E-library

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

Richard C. Heyser Memorial Lecture Series

AES E-Library

Multichannel speaker diarization with arbitrary microphone arrays

Choose your country of residence from this list:

AES E-Library

Login Institutions

Multichannel speaker diarization with arbitrary microphone arrays

Choose your country of residence from this list: