Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.
Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
Elife. 2022 Jun 22;11:e71862. doi: 10.7554/eLife.71862.
Type 2 diabetes (T2D) accounts for ~90% of all cases of diabetes, resulting in an estimated 6.7 million deaths in 2021, according to the International Diabetes Federation. Early detection of patients with high risk of developing T2D can reduce the incidence of the disease through a change in lifestyle, diet, or medication. Since populations of lower socio-demographic status are more susceptible to T2D and might have limited resources or access to sophisticated computational resources, there is a need for accurate yet accessible prediction models.
In this study, we analyzed data from 44,709 nondiabetic UK Biobank participants aged 40-69, predicting the risk of T2D onset within a selected time frame (mean of 7.3 years with an SD of 2.3 years). We started with 798 features that we identified as potential predictors for T2D onset. We first analyzed the data using gradient boosting decision trees, survival analysis, and logistic regression methods. We devised one nonlaboratory model accessible to the general population and one more precise yet simple model that utilizes laboratory tests. We simplified both models to an accessible scorecard form, tested the models on normoglycemic and prediabetes subcohorts, and compared the results to the results of the general cohort. We established the nonlaboratory model using the following covariates: sex, age, weight, height, waist size, hip circumference, waist-to-hip ratio, and body mass index. For the laboratory model, we used age and sex together with four common blood tests: high-density lipoprotein (HDL), gamma-glutamyl transferase, glycated hemoglobin, and triglycerides. As an external validation dataset, we used the electronic medical record database of Clalit Health Services.
The nonlaboratory scorecard model achieved an area under the receiver operating curve (auROC) of 0.81 (95% confidence interval [CI] 0.77-0.84) and an odds ratio (OR) between the upper and fifth prevalence deciles of 17.2 (95% CI 5-66). Using this model, we classified three risk groups, a group with 1% (0.8-1%), 5% (3-6%), and the third group with a 9% (7-12%) risk of developing T2D. We further analyzed the contribution of the laboratory-based model and devised a blood test model based on age, sex, and the four common blood tests noted above. In this scorecard model, we included age, sex, glycated hemoglobin (HbA1c%), gamma glutamyl-transferase, triglycerides, and HDL cholesterol. Using this model, we achieved an auROC of 0.87 (95% CI 0.85-0.90) and a deciles' OR of ×48 (95% CI 12-109). Using this model, we classified the cohort into four risk groups with the following risks: 0.5% (0.4-7%); 3% (2-4%); 10% (8-12%); and a high-risk group of 23% (10-37%) of developing T2D. When applying the blood tests model using the external validation cohort (Clalit), we achieved an auROC of 0.75 (95% CI 0.74-0.75). We analyzed several additional comprehensive models, which included genotyping data and other environmental factors. We found that these models did not provide cost-efficient benefits over the four blood test model. The commonly used German Diabetes Risk Score (GDRS) and Finnish Diabetes Risk Score (FINDRISC) models, trained using our data, achieved an auROC of 0.73 (0.69-0.76) and 0.66 (0.62-0.70), respectively, inferior to the results achieved by the four blood test model and by the anthropometry models.
The four blood test and anthropometric models outperformed the commonly used nonlaboratory models, the FINDRISC and the GDRS. We suggest that our models be used as tools for decision-makers to assess populations at elevated T2D risk and thus improve medical strategies. These models might also provide a personal catalyst for changing lifestyle, diet, or medication modifications to lower the risk of T2D onset.
The funders had no role in study design, data collection, interpretation, or the decision to submit the work for publication.
根据国际糖尿病联合会的数据,2 型糖尿病(T2D)约占所有糖尿病病例的 90%,导致 2021 年约有 670 万人死亡。通过改变生活方式、饮食或药物,早期发现有发生 T2D 风险的患者,可以降低疾病的发病率。由于社会经济地位较低的人群更容易患 T2D,而且可能资源有限或无法获得复杂的计算资源,因此需要准确且易于使用的预测模型。
在这项研究中,我们分析了来自 44709 名年龄在 40-69 岁之间的非糖尿病 UK Biobank 参与者的数据,预测在选定时间段内(平均 7.3 年,标准差为 2.3 年)发生 T2D 发病的风险。我们首先从 798 个特征开始,这些特征被确定为 T2D 发病的潜在预测因子。我们使用梯度提升决策树、生存分析和逻辑回归方法分析了数据。我们设计了一个面向普通人群的非实验室模型和一个更精确但简单的模型,该模型利用实验室测试。我们将两个模型简化为易于使用的记分卡形式,在正常血糖和前期糖尿病亚队列中测试了模型,并将结果与普通队列的结果进行了比较。我们使用以下协变量建立了非实验室模型:性别、年龄、体重、身高、腰围、臀围、腰臀比和体重指数。对于实验室模型,我们使用年龄和性别以及四项常见血液测试:高密度脂蛋白(HDL)、γ-谷氨酰转移酶、糖化血红蛋白和甘油三酯。作为外部验证数据集,我们使用了 Clalit 健康服务的电子病历数据库。
非实验室记分卡模型的受试者工作特征曲线下面积(auROC)为 0.81(95%置信区间 [CI] 0.77-0.84),上五分位数与第五五分位数之间的比值(OR)为 17.2(95%CI 5-66)。使用该模型,我们将风险人群分为三个风险组,一组的发病风险为 1%(0.8-1%),一组为 5%(3-6%),第三组的发病风险为 9%(7-12%)。我们进一步分析了实验室模型的贡献,并设计了一个基于年龄、性别和上述四项常见血液测试的血液测试模型。在这个记分卡模型中,我们纳入了年龄、性别、糖化血红蛋白(HbA1c%)、γ-谷氨酰转移酶、甘油三酯和高密度脂蛋白胆固醇。使用这个模型,我们得到了 auROC 为 0.87(95%CI 0.85-0.90)和十等分位数 OR 为 48(95%CI 12-109)。使用这个模型,我们将队列分为四个风险组,其风险分别为:0.5%(0.4-7%);3%(2-4%);10%(8-12%);高风险组为 23%(10-37%)。当我们在外部验证队列(Clalit)中使用血液测试模型时,我们得到了 auROC 为 0.75(95%CI 0.74-0.75)。我们分析了几个额外的综合模型,这些模型包括基因分型数据和其他环境因素。我们发现,与四项血液测试模型相比,这些模型并没有带来成本效益的优势。常用的德国糖尿病风险评分(GDRS)和芬兰糖尿病风险评分(FINDRISC)模型,使用我们的数据进行训练,得到的 auROC 分别为 0.73(0.69-0.76)和 0.66(0.62-0.70),低于四项血液测试模型和人体测量模型的结果。
四项血液测试和人体测量模型优于常用的非实验室模型,包括 FINDRISC 和 GDRS。我们建议将我们的模型用作决策者评估高 T2D 风险人群的工具,从而改善医疗策略。这些模型也可能为改变生活方式、饮食或药物治疗以降低 T2D 发病风险提供个人动力。
资助者在研究设计、数据收集、解释或提交工作以供发表方面没有作用。