IEEE Trans Neural Syst Rehabil Eng. 2018 Mar;26(3):637-645. doi: 10.1109/TNSRE.2018.2802914.
Assistive speech-based technologies can improve the quality of life for people affected with dysarthria, a motor speech disorder. In this paper, we explore multiple ways to improve Gaussian mixture model and deep neural network (DNN) based hidden Markov model (HMM) automatic speech recognition systems for TORGO dysarthric speech database. This work shows significant improvements over the previous attempts in building such systems in TORGO. We trained speaker-specific acoustic models by tuning various acoustic model parameters, using speaker normalized cepstral features and building complex DNN-HMM models with dropout and sequence-discrimination strategies. The DNN-HMM models for severe and severe-moderate dysarthric speakers were further improved by leveraging specific information from dysarthric speech to DNN models trained on audio files from both dysarthric and normal speech, using generalized distillation framework. To the best of our knowledge, this paper presents the best recognition accuracies for TORGO database till date.
基于辅助言语的技术可以提高患有构音障碍(一种运动言语障碍)的人的生活质量。在本文中,我们探索了多种方法来改进基于高斯混合模型和深度神经网络(DNN)的隐马尔可夫模型(HMM)自动语音识别系统,以用于 TORGO 构音障碍语音数据库。与之前在 TORGO 中构建此类系统的尝试相比,这项工作取得了显著的改进。我们通过调整各种声学模型参数、使用说话人归一化倒谱系数特征以及构建具有 dropout 和序列判别策略的复杂 DNN-HMM 模型,来训练特定于说话人的声学模型。通过利用从构音障碍语音中提取的特定信息,我们进一步改进了严重和严重中度构音障碍说话人的 DNN-HMM 模型,该信息是针对同时在构音障碍语音和正常语音的音频文件上训练的 DNN 模型使用广义蒸馏框架。据我们所知,本文提出了迄今为止 TORGO 数据库的最佳识别准确率。