University of Toronto, Ontario, Canada.
J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.
In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine.
Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In the 1st experiment, the authors provided a baseline model indicating a relatively low performance with traditional automatic speech recognition (ASR) using only acoustic data from dysarthric individuals. In the 2nd experiment, the authors used various measures of entropy (statistical disorder) to determine whether characteristics of dysarthric articulation can reduce uncertainty in features of dysarthric acoustics. These findings led to the 3rd experiment, in which recorded dysarthric articulation was directly encoded into the speech recognition process.
The authors found that 18.3% of the statistical disorder in the acoustics of speakers with dysarthria can be removed if articulatory parameters are known. Using articulatory models reduces phoneme recognition errors relatively by up to 6% for speakers with dysarthria in speaker-dependent systems.
Articulatory knowledge is useful in reducing rates of error in ASR for speakers with dysarthria and in reducing statistical uncertainty of their acoustic signals. These findings may help to guide clinical decisions related to the use of ASR in the future.
在这项研究中,作者探索了发音信息作为提高机器识别构音障碍语音的一种手段。
数据主要来自构音障碍发音的 TORGO 数据库(Rudzicz、Namasivayam 和 Wolff,2011),其中在说话期间测量了声道中各个点的运动。在第 1 个实验中,作者提供了一个基线模型,表明仅使用来自构音障碍个体的声学数据,传统的自动语音识别(ASR)的性能相对较低。在第 2 个实验中,作者使用各种熵(统计无序)度量来确定构音障碍发音的特征是否可以降低构音障碍声学特征的不确定性。这些发现导致了第 3 个实验,其中记录的构音障碍发音直接被编码到语音识别过程中。
作者发现,如果已知发音参数,则可以消除构音障碍语音中 18.3%的统计无序。在说话者依赖系统中,使用发音模型可以将构音障碍说话者的音素识别错误率相对降低 6%左右。
发音知识有助于降低构音障碍说话者的 ASR 错误率,并降低其声学信号的统计不确定性。这些发现可能有助于指导未来与 ASR 使用相关的临床决策。