AES E-Library

A Recursive Attractor Network for Long-Form Sound Source Localization and Identity Tracking with a Variable Number of Sources

Sound source localization and identity tracking are fundamental tasks in acoustic scene analysis, enabling machines to determine what, where, and when sound events occur. While deep attractor-based networks have demonstrated improved performance under an unknown number of sources, maintaining continuous source tracking over longform audio remains challenging due to memory limitations and permutation ambiguities across adjacent segments. In this paper, we propose a Recursive Attractor Network (RANet) for long-form sound source localization
and identity tracking with a variable number of sources. RANet explicitly represents attractors as transferable embeddings and recursively propagates them across adjacent audio segments using a LSTM-based model, thereby preserving source identity continuity over time. Experimental results on simulated datasets demonstrate that RANet achieves robust long-form localization and consistent source identity tracking, outperforming baseline approaches.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Session subject:

DOI:


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
16938
Choose your country of residence from this list: