基于语音分析和可解释机器学习的帕金森病无创检测

Non-invasive detection of Parkinson's disease based on speech analysis and interpretable machine learning.

作者信息

Xu Huanqing, Xie Wei, Pang Mingzhen, Li Ya, Jin Luhua, Huang Fangliang, Shao Xian

机构信息

The School of Medical Information Engineering, Anhui University of Chinese Medicine, Hefei, China.

Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi, China.

出版信息

Front Aging Neurosci. 2025 Apr 30;17:1586273. doi: 10.3389/fnagi.2025.1586273. eCollection 2025.

DOI:10.3389/fnagi.2025.1586273

PMID:40370753

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12075230/

Abstract

OBJECTIVE

Parkinson's disease (PD) is a progressive neurodegenerative disorder that significantly impacts motor function and speech patterns. Early detection of PD through non-invasive methods, such as speech analysis, can improve treatment outcomes and quality of life for patients. This study aims to develop an interpretable machine learning model that uses speech recordings and acoustic features to predict PD.

METHODS

A dataset of speech recordings from individuals with and without PD was analyzed. The dataset includes features such as fundamental frequency (Fo), jitter, shimmer, noise-to-harmonics ratio (NHR), and non-linear dynamic complexity measures. Exploratory data analysis (EDA) was conducted to identify patterns and relationships in the data. The dataset was split into 70% training and 30% testing sets. To address class imbalance, synthetic minority oversampling technique (SMOTE) was applied. Several machine learning algorithms, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Trees, Random Forests, and Neural Networks, were implemented and evaluated. Model performance was assessed using accuracy, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC) metrics. SHapley Additive exPlanations (SHAP) were used to explain the models and evaluate feature contributions.

RESULTS

The analysis revealed that features related to speech instability, such as jitter, shimmer, and NHR, were highly predictive of PD. Non-linear metrics, including Recurrence Plot Dimension Entropy (RPDE) and Pitch Period Entropy (PPE), also made significant contributions to the model's predictive power. Random Forest and Gradient Boosting models achieved the highest performance, with an AUC-ROC of 0.98, recall of 0.95, ensuring minimal false negatives. SHAp values highlighted the importance of fundamental frequency variation and harmonic-to-noise ratio in distinguishing PD patients from healthy individuals.

CONCLUSION

The developed machine learning model accurately predicts Parkinson's disease using speech recordings, with Random Forest and Gradient Boosting algorithms demonstrating superior performance. Key predictive features include jitter, shimmer, and non-linear dynamic complexity measures. This study provides a reliable, non-invasive tool for early PD detection and underscores the potential of speech analysis in diagnosing neurodegenerative diseases.

摘要

目的

帕金森病（PD）是一种进行性神经退行性疾病，对运动功能和言语模式有显著影响。通过语音分析等非侵入性方法早期检测帕金森病，可以改善患者的治疗效果和生活质量。本研究旨在开发一种可解释的机器学习模型，该模型利用语音记录和声学特征来预测帕金森病。

方法

分析了有和没有帕金森病个体的语音记录数据集。该数据集包括基频（Fo）、抖动、闪烁、噪声与谐波比（NHR）以及非线性动态复杂性度量等特征。进行探索性数据分析（EDA）以识别数据中的模式和关系。数据集被分为70%的训练集和30%的测试集。为了解决类别不平衡问题，应用了合成少数过采样技术（SMOTE）。实施并评估了几种机器学习算法，包括K近邻（KNN）、支持向量机（SVM）、决策树、随机森林和神经网络。使用准确率、召回率、F1分数和接收器操作特征曲线下面积（AUC-ROC）指标评估模型性能。使用SHapley加性解释（SHAP）来解释模型并评估特征贡献。

结果

分析表明，与语音不稳定性相关的特征，如抖动、闪烁和NHR，对帕金森病具有高度预测性。包括递归图维度熵（RPDE）和基音周期熵（PPE）在内的非线性度量也对模型的预测能力做出了重大贡献。随机森林和梯度提升模型表现最佳AUC-ROC为0.98，召回率为0.95，确保了最小的假阴性。SHAP值突出了基频变化和谐波与噪声比在区分帕金森病患者和健康个体方面的重要性。

结论

所开发的机器学习模型使用语音记录准确预测帕金森病，随机森林和梯度提升算法表现出卓越性能。关键预测特征包括抖动、闪烁和非线性动态复杂性度量。本研究为帕金森病的早期检测提供了一种可靠的非侵入性工具，并强调了语音分析在诊断神经退行性疾病方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a153/12075230/ed3035285a79/fnagi-17-1586273-g001.jpg

相似文献

Non-invasive detection of Parkinson's disease based on speech analysis and interpretable machine learning.

Front Aging Neurosci. 2025 Apr 30;17:1586273. doi: 10.3389/fnagi.2025.1586273. eCollection 2025.

Gradient boosting for Parkinson's disease diagnosis from voice recordings.

BMC Med Inform Decis Mak. 2020 Sep 15;20(1):228. doi: 10.1186/s12911-020-01250-7.

Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.

Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.

Prediction of lumbar disc degeneration based on interpretable machine learning models: retrospective cohort study.

Spine J. 2025 Apr 9. doi: 10.1016/j.spinee.2025.04.004.

Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm.

Front Oncol. 2024 Aug 9;14:1403392. doi: 10.3389/fonc.2024.1403392. eCollection 2024.

Development and validation of an interpretable machine learning model for predicting in-hospital mortality for ischemic stroke patients in ICU.

Int J Med Inform. 2025 Jun;198:105874. doi: 10.1016/j.ijmedinf.2025.105874. Epub 2025 Mar 9.

Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.

Front Cardiovasc Med. 2025 Jan 9;11:1504957. doi: 10.3389/fcvm.2024.1504957. eCollection 2024.

Machine learning models predict triage levels, massive transfusion protocol activation, and mortality in trauma utilizing patients hemodynamics on admission.

Comput Biol Med. 2024 Sep;179:108880. doi: 10.1016/j.compbiomed.2024.108880. Epub 2024 Jul 16.

Development and Validation of an Interpretable Machine Learning Prediction Model for Total Pathological Complete Response after Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer: Multicenter Retrospective Analysis.

J Cancer. 2024 Aug 1;15(15):5058-5071. doi: 10.7150/jca.97190. eCollection 2024.

Predicting isolated impaired glucose tolerance without oral glucose tolerance test using machine learning in Chinese Han men.

Front Endocrinol (Lausanne). 2025 Apr 24;16:1514397. doi: 10.3389/fendo.2025.1514397. eCollection 2025.

引用本文的文献

Speech-Based Parkinson's Detection Using Pre-Trained Self-Supervised Automatic Speech Recognition (ASR) Models and Supervised Contrastive Learning.

Bioengineering (Basel). 2025 Jul 1;12(7):728. doi: 10.3390/bioengineering12070728.

本文引用的文献

Multi-label speech feature selection for Parkinson's Disease subtype recognition using graph model.

Comput Biol Med. 2025 Feb;185:109566. doi: 10.1016/j.compbiomed.2024.109566. Epub 2024 Dec 24.

Parkinson's disease detection based on features refinement through L1 regularized SVM and deep neural network.

Sci Rep. 2024 Jan 16;14(1):1333. doi: 10.1038/s41598-024-51600-y.

Machine Learning-Based Classification of Parkinson's Disease Patients Using Speech Biomarkers.

J Parkinsons Dis. 2024;14(1):95-109. doi: 10.3233/JPD-230002.

Machine learning for the prediction of postoperative nosocomial pulmonary infection in patients with spinal cord injury.

Eur Spine J. 2023 Nov;32(11):3825-3835. doi: 10.1007/s00586-023-07772-8. Epub 2023 May 17.

Machine learning approaches to identify Parkinson's disease using voice signal features.

Front Artif Intell. 2023 Mar 28;6:1084001. doi: 10.3389/frai.2023.1084001. eCollection 2023.

Machine-Learning-Based Disease Diagnosis: A Comprehensive Review.

Healthcare (Basel). 2022 Mar 15;10(3):541. doi: 10.3390/healthcare10030541.

Voice in Parkinson's Disease: A Machine Learning Study.

Front Neurol. 2022 Feb 15;13:831428. doi: 10.3389/fneur.2022.831428. eCollection 2022.

The combined effect of REM sleep behavior disorder and hyposmia on cognition and motor phenotype in Parkinson's disease.

J Neurol Sci. 2016 Sep 15;368:374-8. doi: 10.1016/j.jns.2016.07.057. Epub 2016 Jul 26.

T test as a parametric statistic.

Korean J Anesthesiol. 2015 Dec;68(6):540-6. doi: 10.4097/kjae.2015.68.6.540. Epub 2015 Nov 25.

Early identification and treatment of communication and swallowing deficits in Parkinson disease.

Semin Speech Lang. 2013 Aug;34(3):185-202. doi: 10.1055/s-0033-1358367. Epub 2013 Oct 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于语音分析和可解释机器学习的帕金森病无创检测

Non-invasive detection of Parkinson's disease based on speech analysis and interpretable machine learning.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献