Suppr超能文献

基于声谱图的帕金森病音频分类研究:语音分类与定性可靠性验证研究。

Exploring Spectrogram-Based Audio Classification for Parkinson's Disease: A Study on Speech Classification and Qualitative Reliability Verification.

机构信息

Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea.

Department of Human-Centered Artificial Intelligence, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea.

出版信息

Sensors (Basel). 2024 Jul 17;24(14):4625. doi: 10.3390/s24144625.

Abstract

Patients suffering from Parkinson's disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson's patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson's through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson's speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson's through speech using two different types of models but also validated the predictions of the model in practice.

摘要

患有帕金森病的患者会出现语音障碍。在这项研究中,我们引入了使用语音对正常人和帕金森病患者进行分类的模型。我们使用了 AST(音频频谱图转换器),这是一种基于转换器的语音分类模型,它在许多领域已经超越了基于 CNN 的模型,以及 PSLA(预训练、采样、标记和聚合),这是现有语音分类领域中的高性能模型,用于该研究。本研究从定量和定性两个方面对模型进行了比较和分析。首先,从定性角度来看,PSLA 在准确性方面比 AST 高出 4%以上,AUC 也更高,AST 为 94.16%,PSLA 为 97.43%。此外,我们通过各种基于 CAM(类激活映射)的 XAI(可解释 AI)模型,如 GradCAM 和 EigenCAM,对模型捕捉帕金森氏症声音特征的能力进行了定性评估。基于 PSLA,我们发现模型很好地关注了帕金森氏症语音的沉闷频段,并且对假阳性和假阴性的热图分析表明,当模型做出不正确的预测时,语音特征也可以通过视觉呈现。本文的贡献在于,我们不仅通过两种不同类型的模型找到了一种使用语音诊断帕金森氏症的合适模型,而且还验证了模型在实践中的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d43/11280556/ae92685d48b0/sensors-24-04625-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验