声道特征在脑瘫语音识别中的应用。

Vocal tract representation in the recognition of cerebral palsied speech.

机构信息

University of Toronto, Ontario, Canada.

出版信息

J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.

DOI:10.1044/1092-4388(2011/11-0223)

PMID:22271873

Abstract

PURPOSE

In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine.

METHOD

Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In the 1st experiment, the authors provided a baseline model indicating a relatively low performance with traditional automatic speech recognition (ASR) using only acoustic data from dysarthric individuals. In the 2nd experiment, the authors used various measures of entropy (statistical disorder) to determine whether characteristics of dysarthric articulation can reduce uncertainty in features of dysarthric acoustics. These findings led to the 3rd experiment, in which recorded dysarthric articulation was directly encoded into the speech recognition process.

RESULTS

The authors found that 18.3% of the statistical disorder in the acoustics of speakers with dysarthria can be removed if articulatory parameters are known. Using articulatory models reduces phoneme recognition errors relatively by up to 6% for speakers with dysarthria in speaker-dependent systems.

CONCLUSIONS

Articulatory knowledge is useful in reducing rates of error in ASR for speakers with dysarthria and in reducing statistical uncertainty of their acoustic signals. These findings may help to guide clinical decisions related to the use of ASR in the future.

摘要

目的

在这项研究中，作者探索了发音信息作为提高机器识别构音障碍语音的一种手段。

方法

数据主要来自构音障碍发音的 TORGO 数据库（Rudzicz、Namasivayam 和 Wolff，2011），其中在说话期间测量了声道中各个点的运动。在第 1 个实验中，作者提供了一个基线模型，表明仅使用来自构音障碍个体的声学数据，传统的自动语音识别（ASR）的性能相对较低。在第 2 个实验中，作者使用各种熵（统计无序）度量来确定构音障碍发音的特征是否可以降低构音障碍声学特征的不确定性。这些发现导致了第 3 个实验，其中记录的构音障碍发音直接被编码到语音识别过程中。