利用临床和行为特征开发用于阿尔茨海默病预测的可解释机器学习模型。

Development of an explainable machine learning model for Alzheimer's disease prediction using clinical and behavioural features.

作者信息

Govindarajan Rajkumar, Thirunadanasikamani K, Napa Komal Kumar, Sathya S, Murugan J Senthil, Priya K G Chandi

机构信息

Department of Computer Science and Engineering, St. Peter's Institute of Higher Education and Research, Avadi, Chennai, India.

Department of Artificial Intelligence and Data Science, Saveetha Engineering College, Chennai, India.

出版信息

MethodsX. 2025 Jul 7;15:103491. doi: 10.1016/j.mex.2025.103491. eCollection 2025 Dec.

DOI:10.1016/j.mex.2025.103491

PMID:40697328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12281133/

Abstract

This article presents a reproducible machine learning methodology for the early prediction of Alzheimer's disease (AD) using clinical and behavioural data. A comparative analysis of multiple classification algorithms was conducted, with the Gradient Boosting classifier yielding the best performance (accuracy: 93.9 %, F1-score: 91.8 %). To improve interpretability, SHapley Additive exPlanations (SHAP) were integrated into the workflow to quantify feature contributions at both global and individual levels. Key predictive variables such as Mini-Mental State Examination (MMSE), Activities of Daily Living (ADL), cholesterol levels, and functional assessment scores were identified and visualized using SHAP-based insights. A user-friendly, interactive web application was developed using Streamlit, allowing real-time patient data input and transparent model output visualization. This method offers a practical tool for clinicians and researchers to support early diagnosis and personalized risk assessment of AD, thus aiding in timely and informed clinical decision-making. Accurate Prediction: Gradient Boosting model achieved 93.9 % accuracy for early Alzheimer's detection. Explainability: SHAP values provided interpretable insights into key clinical features. Clinical Tool: A Streamlit-based web app enabled real-time, explainable predictions for users.

摘要

本文提出了一种可重复的机器学习方法，用于利用临床和行为数据对阿尔茨海默病（AD）进行早期预测。对多种分类算法进行了比较分析，梯度提升分类器表现最佳（准确率：93.9%，F1分数：91.8%）。为了提高可解释性，将SHapley值加法解释（SHAP）集成到工作流程中，以在全局和个体层面量化特征贡献。使用基于SHAP的见解识别并可视化了关键预测变量，如简易精神状态检查表（MMSE）、日常生活活动能力（ADL）、胆固醇水平和功能评估分数。使用Streamlit开发了一个用户友好的交互式网络应用程序，允许实时输入患者数据并可视化透明的模型输出。该方法为临床医生和研究人员提供了一个实用工具，以支持AD的早期诊断和个性化风险评估，从而有助于及时且明智的临床决策。准确预测：梯度提升模型在早期阿尔茨海默病检测中准确率达到93.9%。可解释性：SHAP值为关键临床特征提供了可解释的见解。临床工具：基于Streamlit的网络应用程序为用户实现了实时、可解释的预测。