基于多模态时间序列数据的可解释机器学习模型用于帕金森病的早期检测。

Information Laboratory (InfoLab), Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, South Korea.

Technology Management, Stony Brook University, New York 11794, USA.

Comput Methods Programs Biomed. 2023 Jun;234:107495. doi: 10.1016/j.cmpb.2023.107495. Epub 2023 Mar 23.

BACKGROUND AND OBJECTIVES

Parkinson's Disease (PD) is a devastating chronic neurological condition. Machine learning (ML) techniques have been used in the early prediction of PD progression. Fusion of heterogeneous data modalities proved its capability to improve the performance of ML models. Time series data fusion supports the tracking of the disease over time. In addition, the trustworthiness of the resulting models is improved by adding model explainability features. The literature on PD has not sufficiently explored these three points.

METHODS

In this work, we proposed an ML pipeline for predicting the progression of PD that is both accurate and explainable. We explore the fusion of different combinations of five time series modalities from the Parkinson's Progression Markers Initiative (PPMI) real-world dataset, including patient characteristics, biosamples, medication history, motor, and non-motor function data. Each patient has six visits. The problem has been formulated in two ways: ❶ a three-class based progression prediction with 953 patients in each time series modality, and ❷ a four-class based progression prediction with 1,060 patients in each time series modality. The statistical features of these six visits were calculated from each modality and diverse feature selection methods were applied to select the most informative feature sets. The extracted features were used to train a set of well-known ML models including Support vector machines (SVM), random forests (RF), extra tree classifier (ETC), light gradient boosting machines (LGBM), and stochastic gradient descent (SGD). We examined a number of data-balancing strategies in the pipeline with different combinations of modalities. ML models have been optimized using the Bayesian optimizer. A comprehensive evaluation of various ML methods has been conducted, and the best models have been extended to provide different explainability features.

RESULTS

We compare the performance of ML models before and after optimization and using and without using feature selection. In the three-class experiment and with various modality fusions, the LGBM model produced the most accurate results with a 10-fold cross-validation (10-CV) accuracy of 90.73% using non-motor function modality. RF produced the best results in the four-class experiment with various modality fusions with a 10-CV accuracy of 94.57% using non-motor modality. With the fused dataset of non-motor and motor function modalities, the LGBM model outperformed the other ML models in both the 3-class and 4-class experiments (i.e., 10-CV accuracy of 94.89% and 93.73%, respectively). Using the Shapely Additive Explanations (SHAP) framework, we employed global and instance-based explanations to explain the behavior of each ML classifier. Moreover, we extended the explainability by implementing the LIME and SHAPASH local explainers. The consistency of these explainers has been explored. The resultant classifiers were accurate, explainable, and thus medically more relevant and applicable.

CONCLUSIONS

The select modalities and feature sets were confirmed by the literature and medical experts. The various explainers suggest that the bradykinesia (NP3BRADY) feature was the most dominant and consistent. By providing thorough insights into the influence of multiple modalities on the disease risk, the suggested approach is expected to help improve the clinical knowledge of PD progression processes.

背景与目的

帕金森病（PD）是一种严重的慢性神经系统疾病。机器学习（ML）技术已被用于 PD 进展的早期预测。融合异构数据模态已被证明能够提高 ML 模型的性能。时间序列数据融合支持随着时间的推移跟踪疾病。此外，通过添加模型可解释性特征，可以提高生成模型的可信度。PD 相关文献尚未充分探讨这三点。

方法

在这项工作中，我们提出了一个既准确又可解释的用于预测 PD 进展的 ML 管道。我们探索了融合来自帕金森病进展标志物倡议（PPMI）真实世界数据集的五种不同时间序列模态的不同组合，包括患者特征、生物样本、用药史、运动和非运动功能数据。每位患者有六次就诊。该问题以两种方式进行了公式化：❶ 基于 953 名患者的三类进展预测，每种时间序列模态各有 953 名患者；❷ 基于 1,060 名患者的四类进展预测，每种时间序列模态各有 1,060 名患者。从每个模态中计算了这六次就诊的统计特征，并应用了多种特征选择方法来选择最具信息量的特征集。提取的特征被用于训练一组知名的 ML 模型，包括支持向量机（SVM）、随机森林（RF）、额外树分类器（ETC）、轻梯度提升机（LGBM）和随机梯度下降（SGD）。我们在管道中使用不同的模态组合检验了多种数据平衡策略。使用贝叶斯优化器优化了 ML 模型。对各种 ML 方法进行了全面评估，并扩展了最佳模型以提供不同的可解释性特征。

结果

我们比较了优化前后以及使用和不使用特征选择的 ML 模型的性能。在三类实验和各种模态融合中，LGBM 模型在非运动功能模态下使用 10 折交叉验证（10-CV）的准确率最高，为 90.73%。RF 在各种模态融合的四类实验中产生了最佳结果，非运动模态的 10-CV 准确率为 94.57%。使用非运动和运动功能模态的融合数据集，LGBM 模型在 3 类和 4 类实验中均优于其他 ML 模型（即，10-CV 准确率分别为 94.89%和 93.73%）。使用 Shapely Additive Explanations（SHAP）框架，我们使用全局和实例级解释来解释每个 ML 分类器的行为。此外，我们通过实现 LIME 和 SHAPASH 局部解释器扩展了可解释性。探索了这些解释器的一致性。生成的分类器准确、可解释，因此在医学上更相关和适用。

结论

所选模态和特征集得到了文献和医学专家的确认。各种解释器表明，运动迟缓（NP3BRADY）特征是最主要和最一致的。通过提供对多种模态对疾病风险的影响的深入了解，所提出的方法有望帮助提高对 PD 进展过程的临床认识。

相似文献

Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.

Comput Methods Programs Biomed. 2023 Jun;234:107495. doi: 10.1016/j.cmpb.2023.107495. Epub 2023 Mar 23.

A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer's disease.

Sci Rep. 2021 Jan 29;11(1):2660. doi: 10.1038/s41598-021-82098-3.

Explainable AI-based Alzheimer's prediction and management using multimodal data.

PLoS One. 2023 Nov 16;18(11):e0294253. doi: 10.1371/journal.pone.0294253. eCollection 2023.

Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

PLoS One. 2016 Aug 5;11(8):e0157077. doi: 10.1371/journal.pone.0157077. eCollection 2016.

Multi-modality radiomics of conventional T1 weighted and diffusion tensor imaging for differentiating Parkinson's disease motor subtypes in early-stages.

Sci Rep. 2024 Sep 5;14(1):20708. doi: 10.1038/s41598-024-71860-y.

Gradient boosting for Parkinson's disease diagnosis from voice recordings.

BMC Med Inform Decis Mak. 2020 Sep 15;20(1):228. doi: 10.1186/s12911-020-01250-7.

Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.

BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.

Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification.

Sensors (Basel). 2023 Feb 13;23(4):2085. doi: 10.3390/s23042085.

Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.

JMIR Med Inform. 2022 Mar 25;10(3):e32508. doi: 10.2196/32508.

Responsible AI for cardiovascular disease detection: Towards a privacy-preserving and interpretable model.

Comput Methods Programs Biomed. 2024 Sep;254:108289. doi: 10.1016/j.cmpb.2024.108289. Epub 2024 Jun 17.

引用本文的文献

Applications of machine learning for computer-aided diagnosis of Parkinson's disease: progress and benchmark case study.

Artif Intell Rev. 2025;58(11):357. doi: 10.1007/s10462-025-11347-y. Epub 2025 Aug 29.

An Explainable Approach to Parkinson's Diagnosis Using the Contrastive Explanation Method-CEM.

Diagnostics (Basel). 2025 Aug 18;15(16):2069. doi: 10.3390/diagnostics15162069.

Parkinson's Disease: Bridging Gaps, Building Biomarkers, and Reimagining Clinical Translation.

Cells. 2025 Jul 28;14(15):1161. doi: 10.3390/cells14151161.

Machine learning for Parkinson's disease: a comprehensive review of datasets, algorithms, and challenges.

NPJ Parkinsons Dis. 2025 Jul 1;11(1):187. doi: 10.1038/s41531-025-01025-9.

A swin transformer and CNN fusion framework for accurate Parkinson disease classification in MRI.

Sci Rep. 2025 Apr 29;15(1):15117. doi: 10.1038/s41598-025-93671-5.

The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions.

BMC Med Inform Decis Mak. 2025 Mar 4;25(1):110. doi: 10.1186/s12911-025-02944-6.

Enhanced interpretable thyroid disease diagnosis by leveraging synthetic oversampling and machine learning models.

BMC Med Inform Decis Mak. 2024 Nov 29;24(1):364. doi: 10.1186/s12911-024-02780-0.

Late feature fusion using neural network with voting classifier for Parkinson's disease detection.

BMC Med Inform Decis Mak. 2024 Sep 27;24(1):269. doi: 10.1186/s12911-024-02683-0.

Early prognosis prediction for non-variceal upper gastrointestinal bleeding in the intensive care unit: based on interpretable machine learning.

Eur J Med Res. 2024 Aug 31;29(1):442. doi: 10.1186/s40001-024-02005-0.

Detecting Parkinson's disease from shoe-mounted accelerometer sensors using convolutional neural networks optimized with modified metaheuristics.

PeerJ Comput Sci. 2024 May 13;10:e2031. doi: 10.7717/peerj-cs.2031. eCollection 2024.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.

Comput Methods Programs Biomed. 2023 Jun;234:107495. doi: 10.1016/j.cmpb.2023.107495. Epub 2023 Mar 23.

A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer's disease.

Sci Rep. 2021 Jan 29;11(1):2660. doi: 10.1038/s41598-021-82098-3.

Explainable AI-based Alzheimer's prediction and management using multimodal data.

PLoS One. 2023 Nov 16;18(11):e0294253. doi: 10.1371/journal.pone.0294253. eCollection 2023.

Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

PLoS One. 2016 Aug 5;11(8):e0157077. doi: 10.1371/journal.pone.0157077. eCollection 2016.

Multi-modality radiomics of conventional T1 weighted and diffusion tensor imaging for differentiating Parkinson's disease motor subtypes in early-stages.

Sci Rep. 2024 Sep 5;14(1):20708. doi: 10.1038/s41598-024-71860-y.

Gradient boosting for Parkinson's disease diagnosis from voice recordings.

BMC Med Inform Decis Mak. 2020 Sep 15;20(1):228. doi: 10.1186/s12911-020-01250-7.

Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.

BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.

Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification.

Sensors (Basel). 2023 Feb 13;23(4):2085. doi: 10.3390/s23042085.

Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.

JMIR Med Inform. 2022 Mar 25;10(3):e32508. doi: 10.2196/32508.

Responsible AI for cardiovascular disease detection: Towards a privacy-preserving and interpretable model.

Comput Methods Programs Biomed. 2024 Sep;254:108289. doi: 10.1016/j.cmpb.2024.108289. Epub 2024 Jun 17.

引用本文的文献

Applications of machine learning for computer-aided diagnosis of Parkinson's disease: progress and benchmark case study.

Artif Intell Rev. 2025;58(11):357. doi: 10.1007/s10462-025-11347-y. Epub 2025 Aug 29.

An Explainable Approach to Parkinson's Diagnosis Using the Contrastive Explanation Method-CEM.

Diagnostics (Basel). 2025 Aug 18;15(16):2069. doi: 10.3390/diagnostics15162069.

Parkinson's Disease: Bridging Gaps, Building Biomarkers, and Reimagining Clinical Translation.

Cells. 2025 Jul 28;14(15):1161. doi: 10.3390/cells14151161.

Machine learning for Parkinson's disease: a comprehensive review of datasets, algorithms, and challenges.

NPJ Parkinsons Dis. 2025 Jul 1;11(1):187. doi: 10.1038/s41531-025-01025-9.

A swin transformer and CNN fusion framework for accurate Parkinson disease classification in MRI.

Sci Rep. 2025 Apr 29;15(1):15117. doi: 10.1038/s41598-025-93671-5.

The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions.

BMC Med Inform Decis Mak. 2025 Mar 4;25(1):110. doi: 10.1186/s12911-025-02944-6.

Enhanced interpretable thyroid disease diagnosis by leveraging synthetic oversampling and machine learning models.

BMC Med Inform Decis Mak. 2024 Nov 29;24(1):364. doi: 10.1186/s12911-024-02780-0.

Late feature fusion using neural network with voting classifier for Parkinson's disease detection.

BMC Med Inform Decis Mak. 2024 Sep 27;24(1):269. doi: 10.1186/s12911-024-02683-0.

Early prognosis prediction for non-variceal upper gastrointestinal bleeding in the intensive care unit: based on interpretable machine learning.

Eur J Med Res. 2024 Aug 31;29(1):442. doi: 10.1186/s40001-024-02005-0.

Detecting Parkinson's disease from shoe-mounted accelerometer sensors using convolutional neural networks optimized with modified metaheuristics.

PeerJ Comput Sci. 2024 May 13;10:e2031. doi: 10.7717/peerj-cs.2031. eCollection 2024.

Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.

机构信息

出版信息

BACKGROUND AND OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景与目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献