Tu Ming, Wisler Alan, Berisha Visar, Liss Julie M
Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona 85287, USA.
School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, Arizona 85287,
J Acoust Soc Am. 2016 Nov;140(5):EL416. doi: 10.1121/1.4967208.
State-of-the-art automatic speech recognition (ASR) engines perform well on healthy speech; however recent studies show that their performance on dysarthric speech is highly variable. This is because of the acoustic variability associated with the different dysarthria subtypes. This paper aims to develop a better understanding of how perceptual disturbances in dysarthric speech relate to ASR performance. Accurate ratings of a representative set of 32 dysarthric speakers along different perceptual dimensions are obtained and the performance of a representative ASR algorithm on the same set of speakers is analyzed. This work explores the relationship between these ratings and ASR performance and reveals that ASR performance can be predicted from perceptual disturbances in dysarthric speech with articulatory precision contributing the most to the prediction followed by prosody.
最先进的自动语音识别(ASR)引擎在正常语音上表现良好;然而,最近的研究表明,它们在构音障碍语音上的表现差异很大。这是由于与不同构音障碍亚型相关的声学变异性。本文旨在更好地理解构音障碍语音中的感知障碍与ASR性能之间的关系。获得了32名有代表性的构音障碍患者在不同感知维度上的准确评分,并分析了一种有代表性的ASR算法在同一组患者上的性能。这项工作探索了这些评分与ASR性能之间的关系,并揭示了可以从构音障碍语音中的感知障碍预测ASR性能,其中发音精度对预测的贡献最大,其次是韵律。