Chandrakala S, Malini S, Veni S Vishnika
IEEE Trans Neural Syst Rehabil Eng. 2021;29:2425-2434. doi: 10.1109/TNSRE.2021.3125314. Epub 2021 Nov 25.
Assistive speech technology is a challenging task because of the impaired nature of dysarthric speech, such as breathy voice, strained speech, distorted vowels, and consonants. Learning compact and discriminative embeddings for dysarthric speech utterances is essential for impaired speech recognition. We propose a Histogram of States (HoS)-based approach that uses Deep Neural Network-Hidden Markov Model (DNN-HMM) to learn word lattice-based compact and discriminative embeddings. Best state sequence chosen from word lattice is used to represent dysarthric speech utterance. A discriminative model-based classifier is then used to recognize these embeddings. The performance of the proposed approach is evaluated using three datasets, namely 15 acoustically similar words, 100-common words datasets of the UA-SPEECH database, and a 50-words dataset of the TORGO database. The proposed HoS-based approach performs significantly better than the traditional Hidden Markov Model and DNN-HMM-based approaches for all three datasets. The discriminative ability and the compactness of the proposed HoS-based embeddings lead to the best accuracy of impaired speech recognition.
辅助语音技术是一项具有挑战性的任务,因为构音障碍语音具有受损的特性,例如嗓音微弱、发音费劲、元音和辅音扭曲。为构音障碍语音话语学习紧凑且有区分性的嵌入对于受损语音识别至关重要。我们提出一种基于状态直方图(HoS)的方法,该方法使用深度神经网络-隐马尔可夫模型(DNN-HMM)来学习基于词格的紧凑且有区分性的嵌入。从词格中选择的最佳状态序列用于表示构音障碍语音话语。然后使用基于判别模型的分类器来识别这些嵌入。使用三个数据集对所提出方法的性能进行评估,即15个声学相似词、UA-SPEECH数据库的100个常用词数据集以及TORGO数据库的一个50词数据集。对于所有这三个数据集,所提出的基于HoS的方法的性能明显优于传统的隐马尔可夫模型和基于DNN-HMM的方法。所提出的基于HoS的嵌入的区分能力和紧凑性带来了受损语音识别的最佳准确率。