Danso Samuel O, Zeng Zhanhang, Muniz-Terrera Graciela, Ritchie Craig W
Edinburgh Dementia Prevention, Centre for Clinical Brain Sciences, University of Edinburgh Medical School, Edinburgh, United Kingdom.
School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
Front Big Data. 2021 May 26;4:613047. doi: 10.3389/fdata.2021.613047. eCollection 2021.
Alzheimer's disease (AD) has its onset many decades before dementia develops, and work is ongoing to characterise individuals at risk of decline on the basis of early detection through biomarker and cognitive testing as well as the presence/absence of identified risk factors. Risk prediction models for AD based on various computational approaches, including machine learning, are being developed with promising results. However, these approaches have been criticised as they are unable to generalise due to over-reliance on one data source, poor internal and external validations, and lack of understanding of prediction models, thereby limiting the clinical utility of these prediction models. We propose a framework that employs a transfer-learning paradigm with ensemble learning algorithms to develop explainable personalised risk prediction models for dementia. Our prediction models, known as , are initially trained and tested using a publicly available dataset ( = 84,856, mean age = 69 years) with 14 years of follow-up samples to predict the individual risk of developing dementia. The decision boundaries of the best source model are further updated by using an alternative dataset from a different and much younger population ( = 473, mean age = 52 years) to obtain an additional prediction model known as the . We further apply the SHapely Additive exPlanation (SHAP) algorithm to visualise the risk factors responsible for the prediction at both population and individual levels. The best source model achieves a geometric accuracy of 87%, specificity of 99%, and sensitivity of 76%. In comparison to a baseline model, our target model achieves better performance across several performance metrics, within an increase in geometric accuracy of 16.9%, specificity of 2.7%, and sensitivity of 19.1%, an area under the receiver operating curve (AUROC) of 11% and a transfer learning efficacy rate of 20.6%. The strength of our approach is the large sample size used in training the source model, transferring and applying the "knowledge" to another dataset from a different and undiagnosed population for the early detection and prediction of dementia risk, and the ability to visualise the interaction of the risk factors that drive the prediction. This approach has direct clinical utility.
阿尔茨海默病(AD)在痴呆症出现前数十年就已发病,目前正在开展相关工作,旨在通过生物标志物和认知测试以及已确定的风险因素的存在与否进行早期检测,从而对有衰退风险的个体进行特征描述。基于包括机器学习在内的各种计算方法的AD风险预测模型正在开发中,且取得了有前景的结果。然而,这些方法受到了批评,因为它们由于过度依赖单一数据源、内部和外部验证不佳以及对预测模型缺乏理解而无法进行推广,从而限制了这些预测模型的临床实用性。我们提出了一个框架,该框架采用迁移学习范式和集成学习算法来开发可解释的个性化痴呆风险预测模型。我们的预测模型,称为 ,最初使用一个公开可用的数据集( = 84,856,平均年龄 = 69岁)进行训练和测试,该数据集有14年的随访样本,用于预测个体患痴呆症的风险。通过使用来自不同且年龄小得多的人群的另一个数据集( = 473,平均年龄 = 52岁)进一步更新最佳源模型的决策边界,以获得另一个称为 的预测模型。我们进一步应用SHapely加法解释(SHAP)算法,在人群和个体层面可视化导致预测的风险因素。最佳源模型的几何准确率达到87%,特异性为99%,灵敏度为76%。与基线模型相比,我们的目标模型在多个性能指标上表现更好,几何准确率提高了16.9%,特异性提高了2.7%,灵敏度提高了19.1%,受试者操作特征曲线下面积(AUROC)为11%,迁移学习效率为20.6%。我们方法的优势在于用于训练源模型的样本量大,将“知识”转移并应用于来自不同且未诊断人群的另一个数据集以进行痴呆风险的早期检测和预测,以及能够可视化驱动预测的风险因素之间的相互作用。这种方法具有直接的临床实用性。