You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
While Neural Audio Codecs (NAC) have revolutionized monaural audio compression, achieving high-fidelity dual-channel coding at low bitrates remains a significant challenge. Existing approaches often rely on naive independent channel quantization, leading to phase incoherence, or entangled latent modeling, which sacrifices spatial precision for spectral energy. This paper proposes a novel dual-channel coding framework based on contentspatial disentanglement. Reframing spatial reconstruction as an informed source separation task, our architecture
synergizes a frozen, pre-trained DAC encoder for robust mono content preservation with a parameter-efficient side information encoder that predicts fine-grained time-frequency masks. To ensure precise spatial imaging, we introduce explicit physical constraints into the end-to-end training. Experimental results indicate that at low bitrates of 9 and 11 kbps, the proposed method outperforms state-of-the-art dual-mono neural baselines and industry standards in both objective spatial metrics and subjective MUSHRA evaluations.
Author (s): Wang, Yihan;
Qian, Yufan;
Huang, Qingbo;
Qu, Tianshu;
Affiliation:
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; MMLab, ByteDance
(See document for exact affiliation information.)
AES Convention: 160
Paper Number:10294
Publication Date:
2026-05-28
Session subject:
AI and Machine Learning in Audio, Audio Processing
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Wang, Yihan; Qian, Yufan; Huang, Qingbo; Qu, Tianshu; 2026; A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking [PDF]; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; MMLab, ByteDance; Paper 10294; Available from: https://aes.org/publications/elibrary-page/?id=23191
Wang, Yihan; Qian, Yufan; Huang, Qingbo; Qu, Tianshu; A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking [PDF]; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; MMLab, ByteDance; Paper 10294; 2026 Available: https://aes.org/publications/elibrary-page/?id=23191
@inproceedings{Wang2026a,
title={{A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking}},
author={Wang, Yihan and Qian, Yufan and Huang, Qingbo and Qu, Tianshu},
year={2026},
month={jun},
booktitle={Journal of the Audio Engineering Society},
publisher={},
number={10294},
organization={AES},
}
Notifications