Suppr超能文献

正则化说话人自适应 KL-HMM 在构音障碍语音识别中的应用。

Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition.

出版信息

IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1581-1591. doi: 10.1109/TNSRE.2017.2681691. Epub 2017 Mar 13.

Abstract

This paper addresses the problem of recognizing the speech uttered by patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. Patients with dysarthria have articulatory limitation, and therefore, they often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Modern automatic speech recognition systems designed for regular speakers are ineffective for dysarthric sufferers due to the phonetic variation. To capture the phonetic variation, Kullback-Leibler divergence-based hidden Markov model (KL-HMM) is adopted, where the emission probability of state is parameterized by a categorical distribution using phoneme posterior probabilities obtained from a deep neural network-based acoustic model. To further reflect speaker-specific phonetic variation patterns, a speaker adaptation method based on a combination of L2 regularization and confusion-reducing regularization, which can enhance discriminability between categorical distributions of the KL-HMM states while preserving speaker-specific information is proposed. Evaluation of the proposed speaker adaptation method on a database of several hundred words for 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 non-dysarthric control speakers showed that the proposed approach significantly outperformed the conventional deep neural network-based speaker adapted system on dysarthric as well as non-dysarthric speech.

摘要

本文针对识别构音障碍患者语音的问题进行了研究。构音障碍是一种影响言语产生的运动性言语障碍,患者存在发音器官的运动控制障碍,因此常难以发出某些特定的音,导致语音出现可闻的变化。由于语音变化的存在,现代针对正常发音者设计的自动语音识别系统对于构音障碍患者并不适用。为了捕捉这种语音变化,本文采用了基于 Kullback-Leibler 散度的隐马尔可夫模型(KL-HMM),其中状态的发射概率通过使用基于深度神经网络的声学模型获得的音素后验概率来参数化类别分布。为了进一步反映说话人特定的语音变化模式,本文提出了一种基于 L2 正则化和混淆减少正则化相结合的说话人自适应方法,该方法可以在保持说话人特定信息的同时,增强 KL-HMM 状态的类别分布之间的可区分性。在由 12 名轻度构音障碍、8 名中度构音障碍和 10 名非构音障碍控制说话者组成的数百个单词的数据库上对所提出的说话人自适应方法进行评估的结果表明,与传统的基于深度神经网络的说话人自适应系统相比,该方法在构音障碍和非构音障碍语音上均显著提高了识别性能。

相似文献

1
Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition.正则化说话人自适应 KL-HMM 在构音障碍语音识别中的应用。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1581-1591. doi: 10.1109/TNSRE.2017.2681691. Epub 2017 Mar 13.
2
Representation Learning Based Speech Assistive System for Persons With Dysarthria.基于表示学习的构音障碍患者语音辅助系统。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1510-1517. doi: 10.1109/TNSRE.2016.2638830. Epub 2016 Dec 13.
4
Improving Acoustic Models in TORGO Dysarthric Speech Database.改善 TORGO 构音障碍语音数据库中的声学模型。
IEEE Trans Neural Syst Rehabil Eng. 2018 Mar;26(3):637-645. doi: 10.1109/TNSRE.2018.2802914.
6
Vocal tract representation in the recognition of cerebral palsied speech.声道特征在脑瘫语音识别中的应用。
J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.
7
Automated Speech Rate Measurement in Dysarthria.构音障碍中的自动言语速率测量
J Speech Lang Hear Res. 2015 Jun;58(3):698-712. doi: 10.1044/2015_JSLHR-S-14-0242.

本文引用的文献

1
Representation Learning Based Speech Assistive System for Persons With Dysarthria.基于表示学习的构音障碍患者语音辅助系统。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1510-1517. doi: 10.1109/TNSRE.2016.2638830. Epub 2016 Dec 13.
3
5
A fast learning algorithm for deep belief nets.一种用于深度信念网络的快速学习算法。
Neural Comput. 2006 Jul;18(7):1527-54. doi: 10.1162/neco.2006.18.7.1527.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验