Suppr超能文献

自动构音障碍严重程度分类:声学特征与深度学习技术研究。

Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques.

出版信息

IEEE Trans Neural Syst Rehabil Eng. 2022;30:1147-1157. doi: 10.1109/TNSRE.2022.3169814. Epub 2022 May 4.

Abstract

Assessing the severity level of dysarthria can provide an insight into the patient's improvement, assist pathologists to plan therapy, and aid automatic dysarthric speech recognition systems. In this article, we present a comparative study on the classification of dysarthria severity levels using different deep learning techniques and acoustic features. First, we evaluate the basic architectural choices such as deep neural network (DNN), convolutional neural network, gated recurrent units and long short-term memory network using the basic speech features, namely, Mel-frequency cepstral coefficients (MFCCs) and constant-Q cepstral coefficients. Next, speech-disorder specific features computed from prosody, articulation, phonation and glottal functioning are evaluated on DNN models. Finally, we explore the utility of low-dimensional feature representation using subspace modeling to give i-vectors, which are then classified using DNN models. Evaluation is done using the standard UA-Speech and TORGO databases. By giving an accuracy of 93.97% under the speaker-dependent scenario and 49.22% under the speaker-independent scenario for the UA-Speech database, the DNN classifier using MFCC-based i-vectors outperforms other systems.

摘要

评估构音障碍的严重程度可以深入了解患者的改善情况,帮助病理学家制定治疗计划,并有助于自动构音障碍语音识别系统。在本文中,我们使用不同的深度学习技术和声学特征,对构音障碍严重程度的分类进行了比较研究。首先,我们使用基本语音特征(即梅尔频率倒谱系数(MFCC)和恒定 Q 倒谱系数)评估了深度神经网络(DNN)、卷积神经网络、门控循环单元和长短时记忆网络等基本架构选择。接下来,我们在 DNN 模型上评估了韵律、发音、发声和声门功能计算得出的语音障碍特定特征。最后,我们通过子空间建模探索了低维特征表示的效用,以获得 i-向量,然后使用 DNN 模型对其进行分类。使用标准的 UA-Speech 和 TORGO 数据库进行评估。在 UA-Speech 数据库中,对于说话者相关场景的准确率为 93.97%,对于说话者无关场景的准确率为 49.22%,基于 MFCC 的 i-向量的 DNN 分类器优于其他系统。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验