Suppr超能文献

通过分析语音声学模式,利用深度学习对帕金森病进展水平进行细粒度分类。

Leveraging Deep Learning for Fine-Grained Categorization of Parkinson's Disease Progression Levels through Analysis of Vocal Acoustic Patterns.

作者信息

Malekroodi Hadi Sedigh, Madusanka Nuwan, Lee Byeong-Il, Yi Myunggi

机构信息

Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea.

Digital of Healthcare Research Center, Institute of Information Technology and Convergence, Pukyong National University, Busan 48513, Republic of Korea.

出版信息

Bioengineering (Basel). 2024 Mar 21;11(3):295. doi: 10.3390/bioengineering11030295.

Abstract

Speech impairments often emerge as one of the primary indicators of Parkinson's disease (PD), albeit not readily apparent in its early stages. While previous studies focused predominantly on binary PD detection, this research explored the use of deep learning models to automatically classify sustained vowel recordings into healthy controls, mild PD, or severe PD based on motor symptom severity scores. Popular convolutional neural network (CNN) architectures, VGG and ResNet, as well as vision transformers, Swin, were fine-tuned on log mel spectrogram image representations of the segmented voice data. Furthermore, the research investigated the effects of audio segment lengths and specific vowel sounds on the performance of these models. The findings indicated that implementing longer segments yielded better performance. The models showed strong capability in distinguishing PD from healthy subjects, achieving over 95% precision. However, reliably discriminating between mild and severe PD cases remained challenging. The VGG16 achieved the best overall classification performance with 91.8% accuracy and the largest area under the ROC curve. Furthermore, focusing analysis on the vowel /u/ could further improve accuracy to 96%. Applying visualization techniques like Grad-CAM also highlighted how CNN models focused on localized spectrogram regions while transformers attended to more widespread patterns. Overall, this work showed the potential of deep learning for non-invasive screening and monitoring of PD progression from voice recordings, but larger multi-class labeled datasets are needed to further improve severity classification.

摘要

言语障碍常常作为帕金森病(PD)的主要指标之一出现,尽管在其早期阶段并不容易显现。虽然先前的研究主要集中在二元PD检测上,但本研究探索了使用深度学习模型,根据运动症状严重程度评分,将持续元音录音自动分类为健康对照、轻度PD或重度PD。流行的卷积神经网络(CNN)架构VGG和ResNet以及视觉Transformer(Swin),在分割语音数据的对数梅尔频谱图图像表示上进行了微调。此外,该研究还调查了音频片段长度和特定元音对这些模型性能的影响。研究结果表明,使用更长的片段会产生更好的性能。这些模型在区分PD和健康受试者方面表现出强大的能力,准确率超过95%。然而,可靠地区分轻度和重度PD病例仍然具有挑战性。VGG16在总体分类性能方面表现最佳,准确率为91.8%,ROC曲线下面积最大。此外,将分析重点放在元音/u/上可进一步将准确率提高到96%。应用Grad-CAM等可视化技术还突出了CNN模型如何聚焦于局部频谱图区域,而Transformer则关注更广泛的模式。总体而言,这项工作展示了深度学习在从语音记录中对PD进展进行无创筛查和监测方面的潜力,但需要更大的多类标记数据集来进一步改善严重程度分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/deda/10968564/b1ef6ee7c567/bioengineering-11-00295-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验