Hossain Md Kamrul, Ashraf Afrina, Islam Md Mominul, Sourav Shoriful Hassan, Shimul Md Monir Hossain
Department of Computer Science and Engineering Daffodil International University Dhaka Bangladesh.
Department of Public Health Daffodil International University Dhaka Bangladesh.
Alzheimers Dement (Amst). 2025 Aug 8;17(3):e70162. doi: 10.1002/dad2.70162. eCollection 2025 Jul-Sep.
Alzheimer's disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Early diagnosis is vital. We developed an interpretable machine learning (ML) model for early AD prediction using open clinical data.
Data from 2149 adults (60-90 years) were obtained from Kaggle. After preprocessing and feature engineering, tree-based models were trained. A stacking ensemble model combining Gradient Boosting and XGBoost was trained, with Logistic Regression as the meta-learner. SHapley Additive exPlanations (SHAP) provided interpretability. Performance was measured by accuracy, precision, recall, F1 score, ROC and AUC.
The stacked ensemble achieved 97% accuracy (AUC 0.97), with 0.97 precision, 0.94 recall, and 0.96 F1 score for AD. SHAP identified memory complaints, Mini-Mental State Examination (MMSE), functional assessment, behavioral symptoms, cholesterol, and lifestyle factors (activity, diet, sleep) as top predictors.
The ensemble model, enhanced by SHAP analysis, provides accurate and interpretable AD risk predictions with potential applicability in future clinical decision support systems.
Developed an ensemble machine learning (ML) model for early Alzheimer's disease (AD) prediction.Achieved 97% accuracy using stacked XGBoost and Gradient Boosting.SHapley Additive exPlanations (SHAP) analysis identified key cognitive and lifestyle-related risk factors.Model interprets AD risk using explainable artificial intelligence (AI) for clinical applicability.Utilized open-access dataset to ensure reproducibility and transparency.
阿尔茨海默病(AD)是一种进行性神经退行性疾病,也是痴呆症的主要病因。早期诊断至关重要。我们利用公开的临床数据开发了一种可解释的机器学习(ML)模型,用于早期AD预测。
从Kaggle获取了2149名成年人(60 - 90岁)的数据。经过预处理和特征工程后,训练基于树的模型。训练了一个结合梯度提升和XGBoost的堆叠集成模型,以逻辑回归作为元学习器。SHapley值加法解释(SHAP)提供可解释性。通过准确率、精确率、召回率、F1分数、ROC和AUC来衡量性能。
堆叠集成模型的准确率达到97%(AUC为0.97),AD的精确率为0.97,召回率为0.94,F1分数为0.96。SHAP将记忆问题、简易精神状态检查表(MMSE)、功能评估、行为症状、胆固醇和生活方式因素(活动、饮食、睡眠)确定为主要预测因素。
通过SHAP分析增强的集成模型提供了准确且可解释的AD风险预测,在未来临床决策支持系统中具有潜在的适用性。
开发了一种用于早期阿尔茨海默病(AD)预测的集成机器学习(ML)模型。使用堆叠的XGBoost和梯度提升实现了97%的准确率。SHapley值加法解释(SHAP)分析确定了关键的认知和生活方式相关风险因素。该模型使用可解释人工智能(AI)解释AD风险以用于临床应用。利用开放获取数据集确保可重复性和透明度。