强制对齐算法在儿童语音上的性能

Performance of Forced-Alignment Algorithms on Children's Speech.

作者信息

Mahr Tristan J, Berisha Visar, Kawabata Kan, Liss Julie, Hustad Katherine C

机构信息

Waisman Center, University of Wisconsin-Madison.

Department of Communication Sciences and Disorders, Arizona State University, Tempe.

出版信息

J Speech Lang Hear Res. 2021 Jun 18;64(6S):2213-2222. doi: 10.1044/2020_JSLHR-20-00268. Epub 2021 Mar 11.

DOI:10.1044/2020_JSLHR-20-00268

PMID:33705675

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8740721/

Abstract

Purpose Acoustic measurement of speech sounds requires first segmenting the speech signal into relevant units (words, phones, etc.). Manual segmentation is cumbersome and time consuming. Forced-alignment algorithms automate this process by aligning a transcript and a speech sample. We compared the phoneme-level alignment performance of five available forced-alignment algorithms on a corpus of child speech. Our goal was to document aligner performance for child speech researchers. Method The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab-Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals. Results The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Conclusion The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors. Supplemental Material https://doi.org/10.23641/asha.14167058.

摘要

目的语音的声学测量首先需要将语音信号分割成相关单元（单词、音素等）。人工分割既繁琐又耗时。强制对齐算法通过对齐文字记录和语音样本实现这一过程的自动化。我们比较了五种可用的强制对齐算法在儿童语音语料库上的音素级对齐性能。我们的目标是记录儿童语音研究人员使用的对齐器性能。方法儿童语音样本包括42名3至6岁的儿童。使用带有和不带有说话人自适应训练的蒙特利尔强制对齐器、来自Kaldi语音识别引擎的三音素对齐、韵律实验室对齐器和宾夕法尼亚语音实验室强制对齐器对语料库进行强制对齐。该样本也进行了人工对齐以创建黄金标准对齐。我们根据准确性（自动对齐区间是否覆盖人工对齐的中点）以及自动和人工对齐区间之间音素起始时间的差异来评估对齐算法。结果带有说话人自适应训练的蒙特利尔强制对齐器显示出最高的准确性和最小的时间差异。在所有对齐器中，元音始终是对齐最准确的音类，并且对于擦音，所有对齐器的对齐准确性都随着年龄增长而提高。结论性能最佳的对齐器在强制对齐方面仍未达到人类水平的可靠性。研究人员可以将强制对齐用于某些音类的儿童语音（元音、年龄较大儿童的擦音），特别是作为半自动工作流程的一部分，之后可检查对齐中的重大错误。补充材料 https://doi.org/10.23641/asha.14167058 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

强制对齐算法在儿童语音上的性能

Performance of Forced-Alignment Algorithms on Children's Speech.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

强制对齐算法在儿童语音上的性能

Performance of Forced-Alignment Algorithms on Children's Speech.

作者信息

机构信息

出版信息