Georgiev Konstantin, Wang Yiqing, Conkie Andrew, Sinclair Annie, Christodoulou Vyron, Seyedzadeh Saleh, Price Malcolm, Wales Ann, Mills Nicholas L, Shenkin Susan D, McPeake Joanne, Fleuriot Jacques D, Anand Atul
BHF Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK.
Red Star, Glasgow G64 2BS, UK.
Brain Commun. 2024 Dec 24;7(1):fcae469. doi: 10.1093/braincomms/fcae469. eCollection 2025.
Predicting risk of future dementia is essential for primary prevention strategies, particularly in the era of novel immunotherapies. However, few studies have developed population-level prediction models using existing routine healthcare data. In this longitudinal retrospective cohort study, we predicted incident dementia using primary and secondary care health records at 5, 10 and 13 years in 144 113 Scottish older adults who were dementia-free prior to 1st April 2009. Gradient-boosting (XGBoost) prediction models were trained on two feature subsets: data-driven (using all 171 extracted variables) and clinically supervised (22 curated variables). We used a random-stratified internal validation set to rank top predictors in each model, assessing performance stratified by age and socioeconomic deprivation. Predictions were stratified into 10 equally sized risk deciles and ranked by response rate. Over 13 years of follow-up, 11 143 (8%) patients developed dementia. The data-driven models achieved marginally better precision-recall area-under-the-curve scores of 0.18, 0.26 and 0.30 compared to clinically supervised models with scores of 0.17, 0.27 and 0.29 for incident dementia at 5, 10 and 13 years, respectively. The clinically supervised model achieved comparable specificity 0.88 [95% confidence interval (CI) 0.87-0.88] and sensitivity (0.55, 95% CI 0.53-0.57) to the data-driven model for prediction at 13 years. The most important model features were age, deprivation and frailty, measured by a modified electronic frailty index excluding known cognitive deficits. Model precision was consistent across socioeconomic deprivation quintiles but lower in younger-onset (<70 years) dementia cases. At 13 years, dementia was diagnosed in 32% of the population classified as highest risk with 40% of individuals in this group below the age of 80. Personalized estimates of future dementia risk from routinely collected healthcare data could influence risk factor modification and help to target brain imaging and novel immunotherapies in selected individuals with pre-symptomatic disease.
预测未来患痴呆症的风险对于一级预防策略至关重要,尤其是在新型免疫疗法的时代。然而,很少有研究利用现有的常规医疗数据开发出针对整个人口层面的预测模型。在这项纵向回顾性队列研究中,我们利用2009年4月1日之前无痴呆症的144113名苏格兰老年人在5年、10年和13年时的初级和二级医疗健康记录来预测痴呆症的发病情况。梯度提升(XGBoost)预测模型在两个特征子集上进行训练:数据驱动型(使用所有171个提取变量)和临床监督型(22个精选变量)。我们使用随机分层内部验证集对每个模型中的顶级预测因素进行排名,按年龄和社会经济剥夺程度分层评估模型性能。预测结果被分为10个大小相等的风险十分位数,并按响应率进行排名。在13年的随访中,11143名(8%)患者患上了痴呆症。与临床监督型模型相比,数据驱动型模型在5年、10年和13年时预测痴呆症发病的精确召回曲线下面积得分略高,分别为0.18、0.26和0.30,而临床监督型模型的得分分别为0.17、0.27和0.29。临床监督型模型在13年预测时的特异性为0.88[95%置信区间(CI)0.87 - 0.88],敏感性为(0.55,95% CI 0.53 - 0.57),与数据驱动型模型相当。最重要的模型特征是年龄、剥夺程度和虚弱程度,通过排除已知认知缺陷的改良电子虚弱指数来衡量。模型精度在社会经济剥夺程度五分位数中保持一致,但在早发型(<70岁)痴呆症病例中较低。在13年时,被归类为最高风险人群中的32%被诊断患有痴呆症,该组中40%的个体年龄低于80岁。从常规收集的医疗数据中得出的未来痴呆症风险的个性化估计可能会影响风险因素的调整,并有助于在有症状前疾病的特定个体中靶向进行脑成像和新型免疫疗法。