You Jia, Zhang Ya-Ru, Wang Hui-Fu, Yang Ming, Feng Jian-Feng, Yu Jin-Tai, Cheng Wei
Department of Neurology, Huashan Hospital, Institute of Science and Technology for Brain-Inspired Intelligence, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China.
EClinicalMedicine. 2022 Sep 23;53:101665. doi: 10.1016/j.eclinm.2022.101665. eCollection 2022 Nov.
The existing dementia risk models are limited to known risk factors and traditional statistical methods. We aimed to employ machine learning (ML) to develop a novel dementia prediction model by leveraging a rich-phenotypic variable space of 366 features covering multiple domains of health-related data.
In this longitudinal population-based cohort of the UK Biobank (UKB), 425,159 non-demented participants were enrolled from 22 recruitment centres across the UK between March 1, 2006 and October 31, 2010. We implemented a data-driven strategy to identify predictors from 366 candidate variables covering a comprehensive range of genetic and environmental factors and developed the ML model to predict incident dementia and Alzheimer's Disease (AD) within five, ten, and much longer years (median 11.9 [Interquartile range 11.2-12.5] years).
During a follow-up of 5,023,337 person-years, 5287 and 2416 participants developed dementia and AD, respectively. A novel UKB dementia risk prediction (UKB-DRP) model comprising ten predictors including age, , pairs matching time, leg fat percentage, number of medications taken, reaction time, peak expiratory flow, mother's age at death, long-standing illness, and mean corpuscular volume was established. Our prediction model was internally evaluated based on five-fold cross-validation on discrimination and calibration, and it was further compared with existing prediction scales. The UKB-DRP model can achieve high discriminative accuracy in dementia (AUC 0.848 ± 0.007) and even better in AD (AUC 0.862 ± 0.015). The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit -value = 0.92), and the predictive power was solid in different incidence time groups. More importantly, our model presented an apparent superiority over existing models like Cardiovascular Risk Factors, Aging, and Incidence of Dementia Risk Score (AUC 0.705 ± 0.008), the Dementia Risk Score (AUC 0.752 ± 0.007), and the Australian National University Alzheimer's Disease Risk Index (AUC 0.584 ± 0.017). The model was internally validated in the general population of European ancestry and White ethnicity; thus, further validation with independent datasets is necessary to confirm these findings.
Our ML-based UKB-DRP model incorporated ten easily accessible predictors with solid predictive power for incident dementia and AD within five, ten, and much longer years, which can be used to identify individuals at high risk of dementia and AD in the general population.
This study was funded by grants from the Science and Technology Innovation 2030 Major Projects (2022ZD0211600), National Key R&D Program of China (2018YFC1312904, 2019YFA070950), National Natural Science Foundation of China (282071201, 81971032, 82071997), Shanghai Municipal Science and Technology Major Project (2018SHZDZX01), Research Start-up Fund of Huashan Hospital (2022QD002), Excellence 2025 Talent Cultivation Program at Fudan University (3030277001), Shanghai Rising-Star Program (21QA1408700), Medical Engineering Fund of Fudan University (yg2021-013), and the 111 Project (No. B18015).
现有的痴呆风险模型局限于已知风险因素和传统统计方法。我们旨在运用机器学习(ML),通过利用涵盖健康相关数据多个领域的366个特征的丰富表型变量空间,开发一种新型痴呆预测模型。
在这项基于英国生物银行(UKB)纵向人群的队列研究中,2006年3月1日至2010年10月31日期间,从英国22个招募中心招募了425,159名非痴呆参与者。我们实施了一种数据驱动策略,从涵盖广泛遗传和环境因素的366个候选变量中识别预测因子,并开发了ML模型,以预测5年、10年及更长时间(中位数11.9[四分位间距11.2 - 12.5]年)内的新发痴呆和阿尔茨海默病(AD)。
在5,023,337人年的随访期间,分别有5287名和2416名参与者患上痴呆和AD。建立了一种新型的UKB痴呆风险预测(UKB - DRP)模型,该模型包含十个预测因子,包括年龄、匹配时间对、腿部脂肪百分比、服用药物数量、反应时间、呼气峰值流量、母亲死亡年龄、长期疾病和平均红细胞体积。我们的预测模型基于判别和校准的五折交叉验证进行内部评估,并与现有预测量表进行进一步比较。UKB - DRP模型在痴呆预测方面可实现高判别准确性(AUC 0.848±0.007),在AD预测方面甚至更高(AUC 0.862±0.015)。该模型校准良好(Hosmer - Lemeshow拟合优度χ²值 = 0.92),并且在不同发病时间组中预测能力稳定。更重要的是,我们的模型相对于现有模型如心血管危险因素、衰老和痴呆风险评分发病率(AUC 0.705±0.008)、痴呆风险评分(AUC 0.752±0.007)和澳大利亚国立大学阿尔茨海默病风险指数(AUC 0.584±0.017)具有明显优势。该模型在欧洲血统和白人种族的普通人群中进行了内部验证;因此,需要使用独立数据集进行进一步验证以确认这些发现。
我们基于ML的UKB - DRP模型纳入了十个易于获取的预测因子,对5年、10年及更长时间内的新发痴呆和AD具有强大的预测能力,可用于识别普通人群中痴呆和AD的高风险个体。
本研究由科技创新2030重大项目(2022ZD0211600)、国家重点研发计划(2018YFC1312904, 2019YFA070950)、国家自然科学基金(282071201, 81971032, 82071997)、上海市科技重大项目(2018SHZDZX01)、华山医院科研启动基金(2022QD002)、复旦大学卓越2025人才培养计划(3030277001)、上海市启明星计划(21QA1408700)、复旦大学医学工程基金(yg2021 - 013)和111项目(编号B18015)资助。