You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Higher-Order Ambisonics (HOA) encoding from sparse, irregular microphone arrays remains a critical challenge for consumer spatial audio capture in immersive communication and XR. We propose Flow-HOA, a generative framework that jointly optimizes a multi-dimensional objective encompassing time-domain, spectral, and spatial fidelity while producing a deployable, time-invariant bank of Finite Impulse Response (FIR) encoding filters. Using conditional flow matching, the model learns to map a simple prior distribution to the target distribution of FIR filter
coefficients. Training is guided by a composite loss that balances time-domain waveform fidelity, multi-resolution spectral consistency, sub-band energy preservation, and spatial directivity constraints. Objective evaluations on synthetically simulated data demonstrate improved performance over strong model-based baselines in both signal fidelity and spatial accuracy metrics. Subjective listening tests on real microphone array recordings further confirm
that Flow-HOA yields higher overall sound quality with reduced artifacts, demonstrating generalization from synthetic training data to real-world capture conditions.
Author (s): You, Yuhuan;
Qian, Yufan;
Qu, Tianshu;
Wang, Bin;
Lv, Xueyang;
Affiliation:
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; Beijing Xiaomi Mobile Software Co., Ltd; Xiaomi Communications Co., Ltd
(See document for exact affiliation information.)
AES Convention: 160
Paper Number:10293
Publication Date:
2026-05-28
Session subject:
AI and Machine Learning in Audio, Recording, Production, and Reproduction
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

You, Yuhuan; Qian, Yufan; Qu, Tianshu; Wang, Bin; Lv, Xueyang; 2026; Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching [PDF]; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; Beijing Xiaomi Mobile Software Co., Ltd; Xiaomi Communications Co., Ltd; Paper 10293; Available from: https://aes.org/publications/elibrary-page/?id=23190
You, Yuhuan; Qian, Yufan; Qu, Tianshu; Wang, Bin; Lv, Xueyang; Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching [PDF]; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University; Beijing Xiaomi Mobile Software Co., Ltd; Xiaomi Communications Co., Ltd; Paper 10293; 2026 Available: https://aes.org/publications/elibrary-page/?id=23190
@inproceedings{You2026flow-hoa:,
title={{Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching}},
author={You, Yuhuan and Qian, Yufan and Qu, Tianshu and Wang, Bin and Lv, Xueyang},
year={2026},
month={jun},
booktitle={Journal of the Audio Engineering Society},
publisher={},
number={10293},
organization={AES},
}
Notifications