You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
As Computational Auditory Scene Analysis has gained increasing attention with the advancement of artificial intelligence technologies, interest in Acoustic Scene Classification (ASC) has grown rapidly. However, the development of ASC models requires large-scale training datasets, and most publicly available datasets have been recorded using mobile devices and are limited to mono or stereo format. To address this limitation, this paper presents a new dataset of urban acoustic scenes recorded in binaural, aimed at supporting the research field of acoustic scene classification. The dataset comprises 2 hours and 52 minutes of urban sound recorded using a Neumann KU-100 binaural microphone, along with corresponding metadata that includes annotated scene types (e.g., indoor and outdoor) and detailed labels such as urban parks, libraries, streets, subways, stores, and train stations. Recordings were conducted across diverse locations in New York City. The audio is segmented into 10-second clips to support machine learning/deep learning workflows. To ensure the overall quality of the dataset, all recordings were reviewed, and segments containing unwanted noise were removed. By providing a dataset in binaural format, which preserves spatial auditory cues, this dataset is expected to be used for training models with more realistic, human-like perception to improve the classification accuracy. Additionally, it is expected to benefit a wide range of applications, such as AI-based recognition systems, AR/VR applications, and intelligent robotic services, where accurate environmental awareness is crucial. The dataset will be publicly available to support reproducibility and advance further research in acoustic scene classification.
Author (s): Kim, Yujin;
Songmuang, Parichat;
Chen, Enoch;
Affiliation:
New York University; New York University; New York University
(See document for exact affiliation information.)
AES Convention: 159
Paper Number:367
Publication Date:
2025-10-14
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Kim, Yujin; Songmuang, Parichat; Chen, Enoch; 2025; Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis [PDF]; New York University; New York University; New York University; Paper 367; Available from: https://aes.org/publications/elibrary-page/?id=23041
Kim, Yujin; Songmuang, Parichat; Chen, Enoch; Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis [PDF]; New York University; New York University; New York University; Paper 367; 2025 Available: https://aes.org/publications/elibrary-page/?id=23041
@inproceedings{Kim2025capturing,
title={{Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis}},
author={Kim, Yujin and Songmuang, Parichat and Chen, Enoch},
year={2025},
month={may},
booktitle={Journal of the Audio Engineering Society},
publisher={},
number={367},
organization={AES},
}
Notifications