Adi Yossi, Keshet Joseph, Cibelli Emily, Gustafson Erin, Clopper Cynthia, Goldrick Matthew
Department of Computer Science, Bar-Ilan University, Ramat-Gan, 52900, Israel.
Department of Linguistics, Northwestern University, Evanston, Illinois 60208, USA.
J Acoust Soc Am. 2016 Dec;140(6):4517. doi: 10.1121/1.4972527.
A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel. The model is based on the structured prediction framework. The input signal and a hypothesized set of a vowel's onset and offset are mapped to an abstract vector space by a set of acoustic feature functions. The learning algorithm is trained in this space to minimize the difference in expectations between predicted and manually-measured vowel durations. The trained model can then automatically estimate vowel durations without phonetic or orthographic transcription. Results comparing the model to three sets of manually annotated data suggest it outperformed the current gold standard for duration measurement, an hidden Markov model-based forced aligner (which requires orthographic or phonetic transcription as an input).
使语音研究具有可扩展性和可重复性的一个关键障碍是需要依赖主观的人工标注。为了应对这一挑战,开发了一种机器学习算法,用于自动测量一种广泛使用的语音指标:元音时长。使用人工标注的数据来训练一个模型,该模型将包含单个元音且前后都有辅音的任意长度的声学信号段作为输入,并输出元音的时长。该模型基于结构化预测框架。输入信号以及一组假设的元音起始和结束点通过一组声学特征函数被映射到一个抽象向量空间。学习算法在这个空间中进行训练,以最小化预测的元音时长和人工测量的元音时长之间的期望差异。经过训练的模型随后可以在无需语音或正字法转录的情况下自动估计元音时长。将该模型与三组人工标注数据进行比较的结果表明,它的表现优于当前时长测量的黄金标准——基于隐马尔可夫模型的强制对齐器(该对齐器需要正字法或语音转录作为输入)。