Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis

Kim, Yujin and Songmuang, Parichat and Chen, Enoch

AES E-Library

Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis

As Computational Auditory Scene Analysis has gained increasing attention with the advancement of artificial intelligence technologies, interest in Acoustic Scene Classification (ASC) has grown rapidly. However, the development of ASC models requires large-scale training datasets, and most publicly available datasets have been recorded using mobile devices and are limited to mono or stereo format. To address this limitation, this paper presents a new dataset of urban acoustic scenes recorded in binaural, aimed at supporting the research field of acoustic scene classification. The dataset comprises 2 hours and 52 minutes of urban sound recorded using a Neumann KU-100 binaural microphone, along with corresponding metadata that includes annotated scene types (e.g., indoor and outdoor) and detailed labels such as urban parks, libraries, streets, subways, stores, and train stations. Recordings were conducted across diverse locations in New York City. The audio is segmented into 10-second clips to support machine learning/deep learning workflows. To ensure the overall quality of the dataset, all recordings were reviewed, and segments containing unwanted noise were removed. By providing a dataset in binaural format, which preserves spatial auditory cues, this dataset is expected to be used for training models with more realistic, human-like perception to improve the classification accuracy. Additionally, it is expected to benefit a wide range of applications, such as AI-based recognition systems, AR/VR applications, and intelligent robotic services, where accurate environmental awareness is crucial. The dataset will be publicly available to support reproducibility and advance further research in acoustic scene classification.

Author (s): Kim, Yujin; Songmuang, Parichat; Chen, Enoch;
Affiliation: New York University; New York University; New York University (See document for exact affiliation information.)
AES Convention: 159 Paper Number:367
Publication Date: 2025-10-14

DOI:

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Express Paper

AES Conventions

AES Conferences

AES Training & Development

Gift Membership

AES Membership Benefits

Gift Membership

AES Membership Benefits

Become a Sustaining Member

AES Membership Benefits

AES Inside Track

Journal of the AES

AES E-library

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

Richard C. Heyser Memorial Lecture Series

AES E-Library

Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis

Choose your country of residence from this list:

AES E-Library

Login Institutions

Capturing the City in 3D: A Binaural Urban Sound Dataset for Acoustic Scene Analysis

Choose your country of residence from this list: