文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于多模态时间序列数据的可解释机器学习模型用于帕金森病的早期检测。

Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.

机构信息

Information Laboratory (InfoLab), Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, South Korea.

Technology Management, Stony Brook University, New York 11794, USA.

出版信息

Comput Methods Programs Biomed. 2023 Jun;234:107495. doi: 10.1016/j.cmpb.2023.107495. Epub 2023 Mar 23.


DOI:10.1016/j.cmpb.2023.107495
PMID:37003039
Abstract

BACKGROUND AND OBJECTIVES: Parkinson's Disease (PD) is a devastating chronic neurological condition. Machine learning (ML) techniques have been used in the early prediction of PD progression. Fusion of heterogeneous data modalities proved its capability to improve the performance of ML models. Time series data fusion supports the tracking of the disease over time. In addition, the trustworthiness of the resulting models is improved by adding model explainability features. The literature on PD has not sufficiently explored these three points. METHODS: In this work, we proposed an ML pipeline for predicting the progression of PD that is both accurate and explainable. We explore the fusion of different combinations of five time series modalities from the Parkinson's Progression Markers Initiative (PPMI) real-world dataset, including patient characteristics, biosamples, medication history, motor, and non-motor function data. Each patient has six visits. The problem has been formulated in two ways: ❶ a three-class based progression prediction with 953 patients in each time series modality, and ❷ a four-class based progression prediction with 1,060 patients in each time series modality. The statistical features of these six visits were calculated from each modality and diverse feature selection methods were applied to select the most informative feature sets. The extracted features were used to train a set of well-known ML models including Support vector machines (SVM), random forests (RF), extra tree classifier (ETC), light gradient boosting machines (LGBM), and stochastic gradient descent (SGD). We examined a number of data-balancing strategies in the pipeline with different combinations of modalities. ML models have been optimized using the Bayesian optimizer. A comprehensive evaluation of various ML methods has been conducted, and the best models have been extended to provide different explainability features. RESULTS: We compare the performance of ML models before and after optimization and using and without using feature selection. In the three-class experiment and with various modality fusions, the LGBM model produced the most accurate results with a 10-fold cross-validation (10-CV) accuracy of 90.73% using non-motor function modality. RF produced the best results in the four-class experiment with various modality fusions with a 10-CV accuracy of 94.57% using non-motor modality. With the fused dataset of non-motor and motor function modalities, the LGBM model outperformed the other ML models in both the 3-class and 4-class experiments (i.e., 10-CV accuracy of 94.89% and 93.73%, respectively). Using the Shapely Additive Explanations (SHAP) framework, we employed global and instance-based explanations to explain the behavior of each ML classifier. Moreover, we extended the explainability by implementing the LIME and SHAPASH local explainers. The consistency of these explainers has been explored. The resultant classifiers were accurate, explainable, and thus medically more relevant and applicable. CONCLUSIONS: The select modalities and feature sets were confirmed by the literature and medical experts. The various explainers suggest that the bradykinesia (NP3BRADY) feature was the most dominant and consistent. By providing thorough insights into the influence of multiple modalities on the disease risk, the suggested approach is expected to help improve the clinical knowledge of PD progression processes.

摘要

背景与目的:帕金森病(PD)是一种严重的慢性神经系统疾病。机器学习(ML)技术已被用于 PD 进展的早期预测。融合异构数据模态已被证明能够提高 ML 模型的性能。时间序列数据融合支持随着时间的推移跟踪疾病。此外,通过添加模型可解释性特征,可以提高生成模型的可信度。PD 相关文献尚未充分探讨这三点。

方法:在这项工作中,我们提出了一个既准确又可解释的用于预测 PD 进展的 ML 管道。我们探索了融合来自帕金森病进展标志物倡议(PPMI)真实世界数据集的五种不同时间序列模态的不同组合,包括患者特征、生物样本、用药史、运动和非运动功能数据。每位患者有六次就诊。该问题以两种方式进行了公式化:❶ 基于 953 名患者的三类进展预测,每种时间序列模态各有 953 名患者;❷ 基于 1,060 名患者的四类进展预测,每种时间序列模态各有 1,060 名患者。从每个模态中计算了这六次就诊的统计特征,并应用了多种特征选择方法来选择最具信息量的特征集。提取的特征被用于训练一组知名的 ML 模型,包括支持向量机(SVM)、随机森林(RF)、额外树分类器(ETC)、轻梯度提升机(LGBM)和随机梯度下降(SGD)。我们在管道中使用不同的模态组合检验了多种数据平衡策略。使用贝叶斯优化器优化了 ML 模型。对各种 ML 方法进行了全面评估,并扩展了最佳模型以提供不同的可解释性特征。

结果:我们比较了优化前后以及使用和不使用特征选择的 ML 模型的性能。在三类实验和各种模态融合中,LGBM 模型在非运动功能模态下使用 10 折交叉验证(10-CV)的准确率最高,为 90.73%。RF 在各种模态融合的四类实验中产生了最佳结果,非运动模态的 10-CV 准确率为 94.57%。使用非运动和运动功能模态的融合数据集,LGBM 模型在 3 类和 4 类实验中均优于其他 ML 模型(即,10-CV 准确率分别为 94.89%和 93.73%)。使用 Shapely Additive Explanations(SHAP)框架,我们使用全局和实例级解释来解释每个 ML 分类器的行为。此外,我们通过实现 LIME 和 SHAPASH 局部解释器扩展了可解释性。探索了这些解释器的一致性。生成的分类器准确、可解释,因此在医学上更相关和适用。

结论:所选模态和特征集得到了文献和医学专家的确认。各种解释器表明,运动迟缓(NP3BRADY)特征是最主要和最一致的。通过提供对多种模态对疾病风险的影响的深入了解,所提出的方法有望帮助提高对 PD 进展过程的临床认识。

相似文献

[1]
Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease.

Comput Methods Programs Biomed. 2023-6

[2]
A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer's disease.

Sci Rep. 2021-1-29

[3]
Explainable AI-based Alzheimer's prediction and management using multimodal data.

PLoS One. 2023

[4]
Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

PLoS One. 2016-8-5

[5]
Multi-modality radiomics of conventional T1 weighted and diffusion tensor imaging for differentiating Parkinson's disease motor subtypes in early-stages.

Sci Rep. 2024-9-5

[6]
Gradient boosting for Parkinson's disease diagnosis from voice recordings.

BMC Med Inform Decis Mak. 2020-9-15

[7]
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.

BMC Bioinformatics. 2023-9-12

[8]
Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification.

Sensors (Basel). 2023-2-13

[9]
Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.

JMIR Med Inform. 2022-3-25

[10]
Responsible AI for cardiovascular disease detection: Towards a privacy-preserving and interpretable model.

Comput Methods Programs Biomed. 2024-9

引用本文的文献

[1]
Applications of machine learning for computer-aided diagnosis of Parkinson's disease: progress and benchmark case study.

Artif Intell Rev. 2025

[2]
An Explainable Approach to Parkinson's Diagnosis Using the Contrastive Explanation Method-CEM.

Diagnostics (Basel). 2025-8-18

[3]
Parkinson's Disease: Bridging Gaps, Building Biomarkers, and Reimagining Clinical Translation.

Cells. 2025-7-28

[4]
Machine learning for Parkinson's disease: a comprehensive review of datasets, algorithms, and challenges.

NPJ Parkinsons Dis. 2025-7-1

[5]
A swin transformer and CNN fusion framework for accurate Parkinson disease classification in MRI.

Sci Rep. 2025-4-29

[6]
The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions.

BMC Med Inform Decis Mak. 2025-3-4

[7]
Enhanced interpretable thyroid disease diagnosis by leveraging synthetic oversampling and machine learning models.

BMC Med Inform Decis Mak. 2024-11-29

[8]
Late feature fusion using neural network with voting classifier for Parkinson's disease detection.

BMC Med Inform Decis Mak. 2024-9-27

[9]
Early prognosis prediction for non-variceal upper gastrointestinal bleeding in the intensive care unit: based on interpretable machine learning.

Eur J Med Res. 2024-8-31

[10]
Detecting Parkinson's disease from shoe-mounted accelerometer sensors using convolutional neural networks optimized with modified metaheuristics.

PeerJ Comput Sci. 2024-5-13

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索