Arora Vipul, Lahiri Aditi, Reetz Henning
Faculty of Linguistics, Philology and Phonetics, University of Oxford, Oxford, United Kingdom.
Goethe University, Frankfurt am Main, Germany.
J Acoust Soc Am. 2018 Jan;143(1):98. doi: 10.1121/1.5017834.
The authors address the question whether phonological features can be used effectively in an automatic speech recognition (ASR) system for pronunciation training in non-native language (L2) learning. Computer-aided pronunciation training consists of two essential tasks-detecting mispronunciations and providing corrective feedback, usually either on the basis of full words or phonemes. Phonemes, however, can be further disassembled into phonological features, which in turn define groups of phonemes. A phonological feature-based ASR system allows the authors to perform a sub-phonemic analysis at feature level, providing a more effective feedback to reach the acoustic goal and perceptual constancy. Furthermore, phonological features provide a structured way for analysing the types of errors a learner makes, and can readily convey which pronunciations need improvement. This paper presents the authors implementation of such an ASR system using deep neural networks as an acoustic model, and its use for detecting mispronunciations, analysing errors, and rendering corrective feedback. Quantitative as well as qualitative evaluations are carried out for German and Italian learners of English. In addition to achieving high accuracy of mispronunciation detection, the system also provides accurate diagnosis of errors.
作者探讨了语音特征是否能在非母语(第二语言)学习的自动语音识别(ASR)系统中有效用于发音训练这一问题。计算机辅助发音训练包含两项基本任务——检测发音错误并提供纠正反馈,通常是基于完整单词或音素。然而,音素可以进一步分解为语音特征,而语音特征又反过来定义音素组。基于语音特征的ASR系统使作者能够在特征层面进行亚音素分析,提供更有效的反馈以实现声学目标和感知恒常性。此外,语音特征为分析学习者所犯错误的类型提供了一种结构化方式,并且能够轻松传达哪些发音需要改进。本文介绍了作者使用深度神经网络作为声学模型来实现这样一个ASR系统,以及该系统在检测发音错误、分析错误和提供纠正反馈方面的应用。对德语和意大利语的英语学习者进行了定量和定性评估。该系统除了实现发音错误检测的高精度外,还能对错误进行准确诊断。