Maximizing accuracy of forced alignment for spontaneous child speech

Robert Fromont; Lynn Clark; Joshua Wilson Black; Margaret Blackwood

doi:10.34842/shrr-sv10

Options

Article

Maximizing accuracy of forced alignment for spontaneous child speech

Authors

Robert Fromont (University of Canterbury)
Lynn Clark (University of Canterbury)
Joshua Wilson Black (University of Canterbury)
Margaret Blackwood (University of Canterbury)

Abstract

Sociophonetic study of large speech corpora generally requires the use of forced alignment - the automatic process of determining the start and end time of each speech sound within the recording - in order to facilitate large-scale automated extraction of acoustic measurements of targeted vowels or consonants. There is an extensive literature evaluating alignment accuracy of a number of forced alignment tools and procedures, processing speech data from a range of languages and dialects. In general, these evaluations use typical adult speech data, often elicited in a controlled laboratory environment. There is little literature on the effectiveness of forced alignment systems on child speech, and none on speech elicited in field environments. This presents a problem for research at the intersection of language acquisition and sociophonetics as there is no established best practice for automatically aligning child speech. Child speech presents special challenges to automated tools, as it includes more variation in speech sounds and voice quality, and non-standard pronunciation and prosody. We evaluated two toolkits, Kaldi via the Montreal Forced Aligner (MFA), and the Hidden Markov Model Toolkit (HTK), using different configurations to force align non-rhotic child speech elicited in a preschool environment. Against many of our expectations, we found that MFA, using rhotic acoustic models pre-trained on adult speech, performed best. This paper provides a clear methodology for other researchers in sociophonetics to evaluate the success or otherwise of phonetic alignment.

Keywords: child speech, language acquisition, sociophonetics, speech corpora, forced alignment

Downloads:
Download 2023_09_01.pdf
View PDF

Published on
2023-09-01

Peer Reviewed

License

Creative Commons Attribution-Noncommercial 4.0 International

Authors

Robert Fromont (New Zealand Institute of Language, Brain and Behaviour, University of Canterbury)
Lynn Clark (New Zealand Institute of Language, Brain and Behaviour, University of Canterbury)
Joshua Wilson Black (New Zealand Institute of Language, Brain and Behaviour, University of Canterbury)
Margaret Blackwood (New Zealand Institute of Language, Brain and Behaviour, University of Canterbury)

Downloads

Issue

Volume 3 • Issue 1 • 2023

Identifiers

DOI: https://doi.org/10.34842/shrr-sv10

Publication details

Pages: 182-210
Article Number: 7
Accepted on: 2023-08-04

File Checksums (MD5)

2023_09_01.pdf: 922767e7b0501632ae4cc9dbdc74d644

Maximizing accuracy of forced alignment for spontaneous child speech

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary