Stern Amos, Linial Michal
The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
Department of Biological Chemistry, The Life Science Institute, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel.
Geroscience. 2025 Aug 27. doi: 10.1007/s11357-025-01828-x.
Dementia, particularly Alzheimer's disease (AD), presents a growing global health challenge characterized by cognitive decline, behavioral changes, and loss of independence. With increasing life expectancy, early diagnosis and improved clinical strategies are urgently needed. This study developed and evaluated machine learning (ML) models to predict AD risk using UK Biobank data, integrating health, genetic, and lifestyle factors. The cohort included 2878 AD cases and 72,366 controls. Among several algorithms, CatBoost performed best (ROC-AUC = 0.773), especially in females. Inputs included ICD-10 codes from 5 years pre-diagnosis, ApoE-ε4 genotype, and large collection of modifiable risk factors. Despite fewer cases, the risk predictive models for vascular dementia (VaD) outperformed the unique AD models. ApoE-ε4 was the most predictive genetic marker, while other common variants had limited utility. Key non-genetic predictors included comorbidities (e.g., diabetes, hypertension), education, physical activity, and diet. These findings highlight the value of integrating diverse data sources for dementia risk prediction and emphasize the role of sex-specific modeling and modifiable factors in early, personalized intervention strategies.
痴呆症,尤其是阿尔茨海默病(AD),是一个日益严峻的全球健康挑战,其特征为认知能力下降、行为改变和失去独立生活能力。随着预期寿命的增加,迫切需要早期诊断和改进临床策略。本研究利用英国生物银行的数据,开发并评估了机器学习(ML)模型,以预测AD风险,整合了健康、遗传和生活方式因素。该队列包括2878例AD病例和72366例对照。在几种算法中,CatBoost表现最佳(ROC-AUC = 0.773),尤其是在女性中。输入数据包括诊断前5年的ICD-10编码、ApoE-ε4基因型以及大量可改变的风险因素。尽管血管性痴呆(VaD)的病例较少,但其风险预测模型的表现优于单独的AD模型。ApoE-ε4是最具预测性的遗传标记,而其他常见变体的效用有限。关键的非遗传预测因素包括合并症(如糖尿病、高血压)、教育程度、身体活动和饮食。这些发现凸显了整合多种数据源进行痴呆症风险预测的价值,并强调了性别特异性建模和可改变因素在早期个性化干预策略中的作用。