Suppr超能文献

HypernasalityNet:用于自动检测超鼻音的深度递归神经网络。

HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection.

机构信息

College of Electrical Engineering and Information Technology, Sichuan University, 610065, China.

Hospital of Stomatology, Sichuan University, 610065, China.

出版信息

Int J Med Inform. 2019 Sep;129:1-12. doi: 10.1016/j.ijmedinf.2019.05.023. Epub 2019 May 23.

Abstract

BACKGROUND

Cleft palate patients have inability to produce adequate velopharyngeal closure, which results in hypernasal speech. In clinic, hypernasal speech is assessed through subject assessment by speech language pathologists. Automatic hypernasal speech detection can provide aided diagnoses for speech language pathologists and clinicians.

OBJECTIVES

This study aims to develop Long Short-Term Memory (LSTM) based Deep Recurrent Neural Network (DRNN) system to detect hypernasal speech from cleft palate patients, thus to provide aided diagnoses for clinical operation and speech therapy. Meanwhile, the feature mining and classification abilities of LSTM-DRNN system are explored.

METHODS

The utilized speech recordings are 14,544 vowels in Mandarin. Speech data is collected from 144 children (72 children with hypernasality and 72 controls) with the age of 5-12 years old. This work proposes a LSTM based DRNN system to achieve automatic hypernasal speech detection, since LSTM-DRNN can learn short-time dependences of hypernasal speech. The vocal tract based features are fed into LSTM-DRNN to achieve deep mining of features. To verify the feature mining ability of LSTM-DRNN, features projected by LSTM-DRNN are fed into shallow classifiers instead of the following two fully connected layers and a softmax layer. And the features without the projecting process of LSTM-DRNN are directly fed into shallow classifiers as a comparison. Hypernasality-sensitive vowels (/a/, /i/, and /u/) are analyzed for the first time.

RESULTS

This LSTM-DRNN based hypernasal speech detection method reaches higher detection accuracy than that using shallow classifiers, since LSTM-DRNN mines features through time axis and network depth simultaneously. The proposed LSTM-DRNN based hypernasality detection system reaches the highest accuracy of 93.35%. According to the analysis of hypernasality-sensitive vowels, the experimental result concludes that vowels /i/ and /u/ are the most sensitive vowels to hypernasal speech.

CONCLUSIONS

The results show that LSTM-DRNN has robust feature mining ability and classification ability. This is the first work that applies the LSTM-DRNN technique to automatically detect hypernasality in cleft palate speech. The experimental results demonstrate the potential of deep learning on pathologist speech detection.

摘要

背景

腭裂患者无法充分闭合咽腔,导致鼻音过强。临床上,鼻音过强通过言语语言病理学家的主观评估来进行评估。自动鼻音过强语音检测可以为言语语言病理学家和临床医生提供辅助诊断。

目的

本研究旨在开发基于长短期记忆(LSTM)的深度递归神经网络(DRNN)系统,以从腭裂患者中检测鼻音过强语音,从而为临床操作和言语治疗提供辅助诊断。同时,探索 LSTM-DRNN 系统的特征挖掘和分类能力。

方法

所使用的语音记录是 14544 个普通话元音。语音数据来自 144 名年龄在 5-12 岁之间的儿童(72 名鼻音过强和 72 名对照组)。本工作提出了一种基于 LSTM 的 DRNN 系统来实现自动鼻音过强语音检测,因为 LSTM-DRNN 可以学习鼻音过强语音的短期依赖性。将声道特征输入到 LSTM-DRNN 中,以实现特征的深度挖掘。为了验证 LSTM-DRNN 的特征挖掘能力,将 LSTM-DRNN 投影的特征输入到浅层分类器中,而不是后续的两个全连接层和一个 softmax 层。并将未经过 LSTM-DRNN 投影过程的特征直接作为比较输入到浅层分类器中。首次分析了对鼻音过强敏感的元音(/a/、/i/和/u/)。

结果

与使用浅层分类器相比,基于 LSTM-DRNN 的鼻音过强语音检测方法达到了更高的检测精度,因为 LSTM-DRNN 同时通过时间轴和网络深度挖掘特征。所提出的基于 LSTM-DRNN 的鼻音检测系统达到了最高的 93.35%的准确率。根据对鼻音过强敏感元音的分析,实验结果得出结论,元音/i/和/u/是对鼻音过强语音最敏感的元音。

结论

结果表明,LSTM-DRNN 具有强大的特征挖掘能力和分类能力。这是首次将 LSTM-DRNN 技术应用于自动检测腭裂语音中的鼻音过强。实验结果表明了深度学习在病理学家语音检测中的应用潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验