Department of Electrical Engineering, National Tsing Hua University, No. 101, Sec. 2, Kuang-Fu Road, Hsinchu, Taiwan 30013.
J Acoust Soc Am. 2011 Jul;130(1):514-25. doi: 10.1121/1.3592233.
The voice onset time (VOT) of a stop consonant is the interval between its burst onset and voicing onset. Among a variety of research topics on VOT, one that has been studied for years is how VOTs are efficiently measured. Manual annotation is a feasible way, but it becomes a time-consuming task when the corpus size is large. This paper proposes an automatic VOT estimation method based on an onset detection algorithm. At first, a forced alignment is applied to identify the locations of stop consonants. Then a random forest based onset detector searches each stop segment for its burst and voicing onsets to estimate a VOT. The proposed onset detection can detect the onsets in an efficient and accurate manner with only a small amount of training data. The evaluation data extracted from the TIMIT corpus were 2344 words with a word-initial stop. The experimental results showed that 83.4% of the estimations deviate less than 10 ms from their manually labeled values, and 96.5% of the estimations deviate by less than 20 ms. Some factors that influence the proposed estimation method, such as place of articulation, voicing of a stop consonant, and quality of succeeding vowel, were also investigated.
语音起始时间(VOT)是指一个塞音的爆发起始和嗓音起始之间的间隔。在各种 VOT 研究主题中,有一个已经研究多年的主题是如何有效地测量 VOT。手动标注是一种可行的方法,但当语料库规模较大时,它就变成了一项耗时的任务。本文提出了一种基于起始检测算法的自动 VOT 估计方法。首先,应用强制对齐来识别塞音的位置。然后,基于随机森林的起始检测器搜索每个塞音段的爆发和嗓音起始,以估计 VOT。所提出的起始检测可以在仅使用少量训练数据的情况下高效、准确地检测起始。从 TIMIT 语料库中提取的评估数据包括 2344 个以单词开头的塞音单词。实验结果表明,83.4%的估计值与手动标记值的偏差小于 10ms,96.5%的估计值的偏差小于 20ms。还研究了一些影响该估计方法的因素,如发音部位、塞音的浊音、以及后续元音的质量等。