AES E-Library

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

While Neural Audio Codecs (NAC) have revolutionized monaural audio compression, achieving high-fidelity dual-channel coding at low bitrates remains a significant challenge. Existing approaches often rely on naive independent channel quantization, leading to phase incoherence, or entangled latent modeling, which sacrifices spatial precision for spectral energy. This paper proposes a novel dual-channel coding framework based on contentspatial disentanglement. Reframing spatial reconstruction as an informed source separation task, our architecture
synergizes a frozen, pre-trained DAC encoder for robust mono content preservation with a parameter-efficient side information encoder that predicts fine-grained time-frequency masks. To ensure precise spatial imaging, we introduce explicit physical constraints into the end-to-end training. Experimental results indicate that at low bitrates of 9 and 11 kbps, the proposed method outperforms state-of-the-art dual-mono neural baselines and industry standards in both objective spatial metrics and subjective MUSHRA evaluations.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Session subject:

DOI:


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
16938
Choose your country of residence from this list: