Ports Kayleen, Dai Jiahui, Conniff Kyle, Corrada Maria M, Manson Spero M, O'Connell Joan, Jiang Luohua
Department of Epidemiology & Biostatistics, Joe C. Wen School of Population & Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, 856 Health Sciences Quad, Irvine, CA 92697-7550, USA.
Department of Statistics, Donald Bren School of Information and Computer Sciences, University of California, Irvine, Bren Hall 2019, Irvine, CA 92697-1250, USA.
Lancet Reg Health Am. 2025 Feb 13;43:101013. doi: 10.1016/j.lana.2025.101013. eCollection 2025 Mar.
Dementia is an increasing concern among American Indian and Alaska Native (AI/AN) communities, yet machine learning models utilizing electronic health record (EHR) data have not been developed or validated for this population. This study aimed to develop a two-year dementia risk prediction model for AI/AN individuals actively using Indian Health Service (IHS) and Tribal health services.
Seven years of data were obtained from the IHS National Data Warehouse and related EHR databases and divided into a five-year baseline period (FY2007-2011) and a two-year dementia prediction period (FY2012-2013). Four algorithms were assessed: logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme Gradient Boosting (XGBoost). Dementia Risk Score (DRS)-based and extended models were developed for each algorithm, with performance evaluated by the area under the receiver operating characteristic curve (AUC).
The study cohort included 17,398 AI/AN adults aged ≥ 65 years who were dementia-free at baseline, of whom 59.8% were female. Over the two-year follow-up, 611 individuals (3.5%) were diagnosed with incident dementia. Extended models for logistic regression, LASSO, and XGBoost performed comparably: AUCs (95% CI) of 0.83 (0.79, 0.86), 0.83 (0.79, 0.86), and 0.82 (0.79, 0.86). These top-performing models shared 12 of the 15 highest-ranked predictors, with novel predictors including service utilization.
Machine learning algorithms utilizing EHR data can effectively predict two-year dementia risk among AI/AN older adults. These models could aid IHS and Tribal health clinicians in identifying high-risk individuals, facilitating timely interventions and improved care coordination.
NIH.
痴呆症在美国印第安人和阿拉斯加原住民(AI/AN)社区中日益受到关注,但利用电子健康记录(EHR)数据的机器学习模型尚未针对该人群进行开发或验证。本研究旨在为积极使用印第安卫生服务局(IHS)和部落卫生服务的AI/AN个体开发一个为期两年的痴呆症风险预测模型。
从IHS国家数据仓库和相关EHR数据库中获取了七年的数据,并将其分为一个五年基线期(2007财年至2011财年)和一个两年痴呆症预测期(2012财年至2013财年)。评估了四种算法:逻辑回归、最小绝对收缩和选择算子(LASSO)、随机森林和极端梯度提升(XGBoost)。为每种算法开发了基于痴呆症风险评分(DRS)的模型和扩展模型,并通过受试者工作特征曲线下面积(AUC)评估性能。
研究队列包括17398名年龄≥65岁的AI/AN成年人,他们在基线时无痴呆症,其中59.8%为女性。在两年的随访中,611人(3.5%)被诊断患有新发痴呆症。逻辑回归、LASSO和XGBoost的扩展模型表现相当:AUC(95%CI)分别为0.83(0.79,0.86)、0.83(0.79,0.86)和0.82(0.79,0.86)。这些表现最佳的模型在15个排名最高的预测因子中共有12个相同,其中新的预测因子包括服务利用情况。
利用EHR数据的机器学习算法可以有效预测AI/AN老年人两年内的痴呆症风险。这些模型可以帮助IHS和部落卫生临床医生识别高危个体,促进及时干预和改善护理协调。
美国国立卫生研究院。