You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
With the advent of Web Audio the compliant browser offers a toolbox of audio production components right out of the box. This could prove valuable for any content producer in the audio field, either professional or amateur. In this paper we will add deep learning to this scenario with the ultimate goal of obtaining machine assisted real-time audio production in-browser. As a proof of concept we will implement a basic yet complete one-channel design that uses deep learning to assist an automatic filtering algorithm producing parameters to adjust an audio signal in the Web Audio context of a browser. To achieve this we will evaluate five-class audio prediction models and compare their accuracy at the model building stage with their accuracy when exported to the real-time context. We will present two ways to measure this accuracy in real-time. We will also present a method to reduce jumpiness in real-time predictions when classification scores are ambiguous. We will highlight some important limitations and we will also present a refined model – designed for our domain specific audio set – based on some architectures from previous research and find that that our model outperforms these architectures
Author (s): Sigvardson, Tor;
Affiliation:
Blekinge Institute of Technology - BTH, Karlskrona, Sweden and Swedish Radio, Stockholm, Sweden
(See document for exact affiliation information.)
AES Convention: 153
Paper Number:10616
Publication Date:
2022-10-06
Session subject:
Signal Processing
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Sigvardson, Tor; 2022; Combining Deep Learning and Web Audio for Automated Real-time Audio Speech Production [PDF]; Blekinge Institute of Technology - BTH, Karlskrona, Sweden and Swedish Radio, Stockholm, Sweden; Paper 10616; Available from: https://aes.org/publications/elibrary-page/?id=21945
Sigvardson, Tor; Combining Deep Learning and Web Audio for Automated Real-time Audio Speech Production [PDF]; Blekinge Institute of Technology - BTH, Karlskrona, Sweden and Swedish Radio, Stockholm, Sweden; Paper 10616; 2022 Available: https://aes.org/publications/elibrary-page/?id=21945
@inproceedings{Sigvardson2022combining,
title={{Combining Deep Learning and Web Audio for Automated Real-time Audio Speech Production}},
author={Sigvardson, Tor},
year={2022},
month={oct},
booktitle={Journal of the Audio Engineering Society},
publisher={Paper 10616; AES Convention 153; October 2022},
number={10616},
organization={AES},
}
Notifications