A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

Wang, Yihan and Qian, Yufan and Huang, Qingbo and Qu, Tianshu

AES E-Library

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

While Neural Audio Codecs (NAC) have revolutionized monaural audio compression, achieving high-fidelity dual-channel coding at low bitrates remains a significant challenge. Existing approaches often rely on naive independent channel quantization, leading to phase incoherence, or entangled latent modeling, which sacrifices spatial precision for spectral energy. This paper proposes a novel dual-channel coding framework based on contentspatial disentanglement. Reframing spatial reconstruction as an informed source separation task, our architecture
synergizes a frozen, pre-trained DAC encoder for robust mono content preservation with a parameter-efficient side information encoder that predicts fine-grained time-frequency masks. To ensure precise spatial imaging, we introduce explicit physical constraints into the end-to-end training. Experimental results indicate that at low bitrates of 9 and 11 kbps, the proposed method outperforms state-of-the-art dual-mono neural baselines and industry standards in both objective spatial metrics and subjective MUSHRA evaluations.

Author (s): Wang, Yihan; Qian, Yufan; Huang, Qingbo; Qu, Tianshu;
Affiliation: State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; MMLab, ByteDance (See document for exact affiliation information.)
AES Convention: 160 Paper Number:10294
Publication Date: 2026-05-28
Session subject: AI and Machine Learning in Audio, Audio Processing

DOI:

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Convention Paper

AES Conventions

AES Conferences

AES Training & Development

AES Inside Track

Journal of the AES

AES E-library

Special Publications

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

Richard C. Heyser Memorial Lecture Series

AES E-Library

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

Choose your country of residence from this list:

AES E-Library

Login Institutions

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

Choose your country of residence from this list: