You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
In this paper, we present an automatic approach for aligning speech signals to corresponding text documents. For this sake, we propose to first use text-to-speech synthesis (TTS) to obtain a speech signal from the textual representation. Subsequently, both speech signals are transformed to sequences of audio features which are then time-aligned using a variant of greedy dynamic time-warping (DTW). The proposed approach is both efficient (with linear running time), computationally simple, and does not rely on a prior training phase as it is necessary when using HMM-based approaches. It benefits from the combination of a) a novel type of speech feature, being correlated to the phonetic progression of speech, b) a greedy left-to-right variant of DTW, and c) the TTS-based approach for creating a feature representation from the input text documents. The feasibility of the proposed method is demonstrated in several experiments.
Author (s): Damm, David;
Grohganz, Harald;
Kurth, Frank;
Ewert, Sebastian;
Clausen, Michael;
Affiliation:
Fraunhofer Institute for Communication, Information Processing and Ergonomics (FKIE), Wachtberg, Germany; University of Bonn, Bonn, Germany
(See document for exact affiliation information.)
Publication Date:
2011-07-06
Session subject:
Speech Processing and Analysis
DOI:
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Damm, David; Grohganz, Harald; Kurth, Frank; Ewert, Sebastian; Clausen, Michael; 2011; SyncTS: Automatic Synchronization of Speech and Text Documents [PDF]; Fraunhofer Institute for Communication, Information Processing and Ergonomics (FKIE), Wachtberg, Germany; University of Bonn, Bonn, Germany; Paper 2-2; Available from: https://aes.org/publications/elibrary-page/?id=15962
Damm, David; Grohganz, Harald; Kurth, Frank; Ewert, Sebastian; Clausen, Michael; SyncTS: Automatic Synchronization of Speech and Text Documents [PDF]; Fraunhofer Institute for Communication, Information Processing and Ergonomics (FKIE), Wachtberg, Germany; University of Bonn, Bonn, Germany; Paper 2-2; 2011 Available: https://aes.org/publications/elibrary-page/?id=15962
@inproceedings{Damm2011syncts:,
title={{SyncTS: Automatic Synchronization of Speech and Text Documents}},
author={Damm, David and Grohganz, Harald and Kurth, Frank and Ewert, Sebastian and Clausen, Michael},
year={2011},
month={jul},
booktitle={Journal of the Audio Engineering Society},
publisher={Paper 2-2; AES Conference: 42nd International Conference: Semantic Audio; July 2011},
number={2-2},
organization={AES},
}
Notifications