Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

He, Weijun; Zhao, Yongyong; Lin, Pei; He, Yuxin; Feng, Qi

AES E-Library

Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

Prosody conversion is an important part in voice conversion, where fundamental frequency (F0), which carries important speaker individuality information (e.g., tone, intonation, etc.), is regarded as one of the key prosodic features in the excitation model for speech synthesis. In a conventional approach based on continuous wavelet transform for modeling F0, analysis is carried out on a frame level and is prone to losing high-frequency information in the process of decomposition and reconstruction. In order to address this problem, the paper shows a representation of long-term fundamental frequency based on Wavelet Packet Transform (WPT). Specifically, the long-term F0 is decomposed usingWPT, and a joint vector is formed by combining the resulted average power spectrum. Furthermore, the method is applied in a voice conversion system. Voice conversion experiments are conducted on Chinese and English speech data to evaluate the performance of the proposed method. The results show that the proposed method is obviously better than the method based on wavelet transform in all conversion scenarios but performs a little worse than the method based on mean and variance in same-gender conversion scenario.

Authors: He, Weijun; Zhao, Yongyong; Lin, Pei; He, Yuxin; Feng, Qi
Affiliations: School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China; School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou, China(See document for exact affiliation information.)
JAES Volume 72 Issue 3 pp. 161-169; March 2024
Publication Date: March 5, 2024 Import into BibTeX
Permalink: https://www.aes.org/e-lib/browse.cfm?elib=22387

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Learn more about the AES E-Library

E-Library Location: (CD JAES72) /jaes72/3/pg161.pdf

DOI: https://doi.org/10.17743/jaes.2022.0126

Start a discussion about this paper!

AES E-Library

Long-Term Fundamental Frequency Modeling Based on Wavelet Packet Transform for Voice Conversion

ABOUT AES

Contact Us