基于 MFCC 和深度神经网络的语音病理学检测分析研究。

An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.

机构信息

Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 21574, Saudi Arabia.

Division of Electronics Engineering, School of Engineering, Cochin University of Science and Technology, India.

出版信息

Comput Math Methods Med. 2022 Apr 4;2022:7814952. doi: 10.1155/2022/7814952. eCollection 2022.

DOI:10.1155/2022/7814952

PMID:35529259

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9071878/

Abstract

Diseases of internal organs other than the vocal folds can also affect a person's voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the "continuous sentence" audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% (cordectomy × healthy). As a result, the suggested framework is the best fit for the healthcare industry.

摘要

除了声带之外，其他内脏器官的疾病也可能会影响人的声音。因此，尽管声音问题经常被忽视，但它们的发生率却在上升。最近的一项研究表明，语音病理检测系统可以成功帮助评估语音异常，并实现语音病理的早期诊断。例如，在早期识别和诊断声音问题时，区分健康和患病声音的自动系统引起了广泛关注。因此，人工智能辅助的语音分析为医疗保健带来了新的可能性。这项工作旨在评估几种自动语音信号分析方法在诊断语音障碍方面的效用，并提出一种健康和患病声音分类的策略。所提出的框架集成了三种声音特征的功效：色度、梅尔频谱和梅尔频率倒谱系数（MFCC）。我们还设计了一个深度神经网络（DNN），能够从检索到的数据中学习，并生成一个高度准确的基于语音的疾病预测模型。该研究描述了一系列使用 Saarbruecken 语音数据库（SVD）来检测异常声音的研究。该模型使用高、中、低三个音高发音的元音 /a/、/i/ 和 /u/ 进行开发和测试。我们还保留了从 SVD 收集的“连续句子”音频文件，以选择开发的模型对全新数据的泛化程度。最高达到的准确率为 77.49%，优于同领域的先前尝试。此外，通过集成说话人性别信息，该模型的准确率达到 88.01%。在选定疾病上训练的设计模型也可以获得 96.77%的最大准确率（声带切除术×健康）。因此，所提出的框架最适合医疗保健行业。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/029f/9071878/6b227ce3cb34/CMMM2022-7814952.001.jpg

相似文献

An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.

Comput Math Methods Med. 2022 Apr 4;2022:7814952. doi: 10.1155/2022/7814952. eCollection 2022.

Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.

Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning.

Comput Math Methods Med. 2022 Apr 4;2022:4364186. doi: 10.1155/2022/4364186. eCollection 2022.

Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

J Voice. 2017 May;31(3):386.e1-386.e8. doi: 10.1016/j.jvoice.2016.09.009. Epub 2016 Oct 10.

Unraveling the complexities of pathological voice through saliency analysis.

Comput Biol Med. 2023 Nov;166:107566. doi: 10.1016/j.compbiomed.2023.107566. Epub 2023 Oct 14.

The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

J Voice. 2024 Sep;38(5):975-982. doi: 10.1016/j.jvoice.2022.03.021. Epub 2022 Apr 27.

Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions.

J Voice. 2017 Jan;31(1):3-15. doi: 10.1016/j.jvoice.2016.01.014. Epub 2016 Mar 15.

Discrimination between pathological and normal voices using GMM-SVM approach.

J Voice. 2011 Jan;25(1):38-43. doi: 10.1016/j.jvoice.2009.08.002. Epub 2010 Feb 4.

Voice pathology detection using optimized convolutional neural networks and explainable artificial intelligence-based analysis.

Comput Methods Biomech Biomed Engin. 2024 Nov;27(14):2041-2057. doi: 10.1080/10255842.2023.2270102. Epub 2023 Oct 18.

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.

Biomed Tech (Berl). 2021 Nov 29;66(6):613-625. doi: 10.1515/bmt-2021-0112. Print 2021 Dec 20.

引用本文的文献

A safe and effective protocol for postdilution hemofiltration with regional citrate anticoagulation.

BMC Nephrol. 2024 Jul 9;25(1):218. doi: 10.1186/s12882-024-03659-y.

Retracted: An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.

Comput Math Methods Med. 2023 Dec 13;2023:9829813. doi: 10.1155/2023/9829813. eCollection 2023.

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection.

Sci Rep. 2023 Dec 20;13(1):22719. doi: 10.1038/s41598-023-49869-6.

An Experimental Analysis on Multicepstral Projection Representation Strategies for Dysphonia Detection.

Sensors (Basel). 2023 May 30;23(11):5196. doi: 10.3390/s23115196.

本文引用的文献

Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix.

Sensors (Basel). 2017 Jan 29;17(2):267. doi: 10.3390/s17020267.

Speech disorders in Parkinson's disease: early diagnostics and effects of medication and brain stimulation.

J Neural Transm (Vienna). 2017 Mar;124(3):303-334. doi: 10.1007/s00702-017-1676-0. Epub 2017 Jan 18.

Mobile Communication Devices, Ambient Noise, and Acoustic Voice Measures.

J Voice. 2017 Mar;31(2):248.e11-248.e23. doi: 10.1016/j.jvoice.2016.07.023. Epub 2016 Sep 29.

An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification.

J Voice. 2017 Jan;31(1):113.e9-113.e18. doi: 10.1016/j.jvoice.2016.03.019. Epub 2016 Apr 19.

Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions.

J Voice. 2017 Jan;31(1):3-15. doi: 10.1016/j.jvoice.2016.01.014. Epub 2016 Mar 15.

Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features.

Comput Math Methods Med. 2015;2015:956249. doi: 10.1155/2015/956249. Epub 2015 Nov 22.

Voice data mining for laryngeal pathology assessment.

Comput Biol Med. 2016 Feb 1;69:270-6. doi: 10.1016/j.compbiomed.2015.07.026. Epub 2015 Aug 10.

On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices.

Logoped Phoniatr Vocol. 2011 Jul;36(2):60-9. doi: 10.3109/14015439.2010.528788. Epub 2010 Nov 12.

Using modulation spectra for voice pathology detection and classification.

Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:2514-7. doi: 10.1109/IEMBS.2009.5334850.

Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods.

Curr Opin Otolaryngol Head Neck Surg. 2008 Jun;16(3):211-5. doi: 10.1097/MOO.0b013e3282fe96ce.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 MFCC 和深度神经网络的语音病理学检测分析研究。

An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献