Suppr超能文献

利用机器学习和电子健康记录(EHR)数据进行阿尔茨海默病及相关痴呆症的早期预测。

Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias.

作者信息

Akter Sonia, Liu Zhandi, Simoes Eduardo J, Rao Praveen

机构信息

Institute for Data Science and Informatics, University of Missouri, USA.

Department of Electrical Engineering and Computer Science, University of Missouri, USA.

出版信息

J Prev Alzheimers Dis. 2025 Apr 16:100169. doi: 10.1016/j.tjpad.2025.100169.

Abstract

BACKGROUND

Over 6 million patients in the United States are affected by Alzheimer's Disease and Related Dementias (ADRD). Early detection of ADRD can significantly improve patient outcomes through timely treatment.

OBJECTIVE

To develop and validate machine learning (ML) models for early ADRD diagnosis and prediction using de-identified EHR data from the University of Missouri (MU) Healthcare.

DESIGN

Retrospective case-control study.

SETTING

The study used de-identified EHR data provided by the MU NextGen Biomedical Informatics, modeled with the PCORnet Common Data Model (CDM).

PARTICIPANTS

An initial cohort of 380,269 patients aged 40 or older with at least two healthcare encounters was narrowed to a final dataset of 4,012 ADRD cases and 119,723 controls.

METHODS

Six ML classifier models: Gradient-Boosted Trees (GBT), Light Gradient-Boosting Machine (LightGBM), Random Forest (RF), eXtreme Gradient-Boosting (XGBoost), Logistic Regression (LR), and Adaptive Boosting (AdaBoost) were evaluated using Area Under the Receiver Operating Characteristic Curve (AUC-ROC), accuracy, sensitivity, specificity, and F1 score. SHAP (SHapley Additive exPlanations) analysis was applied to interpret predictions.

RESULTS

The GBT model achieved the best AUC-ROC scores of 0.809-0.833 across 1- to 5-year prediction windows. SHAP analysis identified depressive disorder, age groups 80-90 yrs and 70-80 yrs, heart disease, anxiety, and the novel risk factors of sleep apnea, and headache.

CONCLUSION

This study underscores the potential of ML models for leveraging EHR data to enable early ADRD prediction, supporting timely interventions, and improving patient outcomes. By identifying both established and novel risk factors, these findings offer new opportunities for personalized screening and management strategies, advancing both clinical and informatics science.

摘要

背景

在美国,超过600万患者受阿尔茨海默病及相关痴呆症(ADRD)影响。ADRD的早期检测可通过及时治疗显著改善患者预后。

目的

利用密苏里大学(MU)医疗保健机构的去标识化电子健康记录(EHR)数据,开发并验证用于ADRD早期诊断和预测的机器学习(ML)模型。

设计

回顾性病例对照研究。

设置

该研究使用了由MU下一代生物医学信息学提供的去标识化EHR数据,并采用PCORnet通用数据模型(CDM)进行建模。

参与者

最初的队列包括380,269名40岁及以上且至少有两次医疗就诊记录的患者,最终数据集缩小为4,012例ADRD病例和119,723名对照。

方法

使用六个ML分类器模型:梯度提升树(GBT)、轻量级梯度提升机(LightGBM)、随机森林(RF)、极端梯度提升(XGBoost)、逻辑回归(LR)和自适应提升(AdaBoost),通过受试者操作特征曲线下面积(AUC-ROC)、准确率、敏感性、特异性和F1分数进行评估。应用SHAP(Shapley加性解释)分析来解释预测结果。

结果

GBT模型在1至5年的预测窗口中获得了最佳的AUC-ROC分数,为0.809 - 0.833。SHAP分析确定了抑郁症、80 - 90岁和70 - 80岁年龄组、心脏病、焦虑症以及睡眠呼吸暂停和头痛等新的风险因素。

结论

本研究强调了ML模型利用EHR数据进行ADRD早期预测、支持及时干预并改善患者预后的潜力。通过识别既定和新的风险因素,这些发现为个性化筛查和管理策略提供了新机会,推动了临床和信息科学的发展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验