The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark.
Sci Rep. 2020 Feb 4;10(1):1776. doi: 10.1038/s41598-020-58601-7.
Identification of individuals at risk of developing disease comorbidities represents an important task in tackling the growing personal and societal burdens associated with chronic diseases. We employed machine learning techniques to investigate to what extent data from longitudinal, nationwide Danish health registers can be used to predict individuals at high risk of developing type 2 diabetes (T2D) comorbidities. Leveraging logistic regression-, random forest- and gradient boosting models and register data spanning hospitalizations, drug prescriptions and contacts with primary care contractors from >200,000 individuals newly diagnosed with T2D, we predicted five-year risk of heart failure (HF), myocardial infarction (MI), stroke (ST), cardiovascular disease (CVD) and chronic kidney disease (CKD). For HF, MI, CVD, and CKD, register-based models outperformed a reference model leveraging canonical individual characteristics by achieving area under the receiver operating characteristic curve improvements of 0.06, 0.03, 0.04, and 0.07, respectively. The top 1,000 patients predicted to be at highest risk exhibited observed incidence ratios exceeding 4.99, 3.52, 1.97 and 4.71 respectively. In summary, prediction of T2D comorbidities utilizing Danish registers led to consistent albeit modest performance improvements over reference models, suggesting that register data could be leveraged to systematically identify individuals at risk of developing disease comorbidities.
识别有发展为疾病合并症风险的个体是应对与慢性疾病相关的个人和社会负担不断增加的重要任务。我们采用机器学习技术来研究丹麦全国纵向健康登记数据在多大程度上可以用于预测有发展为 2 型糖尿病(T2D)合并症风险的个体。利用逻辑回归、随机森林和梯度提升模型以及登记数据,涵盖了 20 多万名新诊断为 T2D 的个体的住院、药物处方和与初级保健承包商的接触情况,我们预测了五年内心力衰竭(HF)、心肌梗死(MI)、中风(ST)、心血管疾病(CVD)和慢性肾病(CKD)的风险。对于 HF、MI、CVD 和 CKD,基于登记的模型的表现优于利用典型个体特征的参考模型,分别实现了 0.06、0.03、0.04 和 0.07 的接收者操作特征曲线下面积的改善。预测为风险最高的前 1000 名患者的观察发病率比分别超过 4.99、3.52、1.97 和 4.71。总之,利用丹麦登记数据预测 T2D 合并症导致参考模型的性能有了一致但适度的提高,这表明登记数据可用于系统地识别有发展为疾病合并症风险的个体。