从前期糖尿病到糖尿病的进展预测：机器学习模型的开发和验证。

Prediction of progression from pre-diabetes to diabetes: Development and validation of a machine learning model.

机构信息

Diabetes Unit, Dept. of Endocrinology and Metabolism, Hadassah University Hospital, Hebrew University of Jerusalem, The Faculty of Medicine, Jerusalem, Israel.

Medial EarlySign, Hod Hasharon, Israel.

出版信息

Diabetes Metab Res Rev. 2020 Feb;36(2):e3252. doi: 10.1002/dmrr.3252. Epub 2020 Jan 14.

DOI:10.1002/dmrr.3252

PMID:31943669

Abstract

AIMS

Identification, a priori, of those at high risk of progression from pre-diabetes to diabetes may enable targeted delivery of interventional programmes while avoiding the burden of prevention and treatment in those at low risk. We studied whether the use of a machine-learning model can improve the prediction of incident diabetes utilizing patient data from electronic medical records.

METHODS

A machine-learning model predicting the progression from pre-diabetes to diabetes was developed using a gradient boosted trees model. The model was trained on data from The Health Improvement Network (THIN) database cohort, internally validated on THIN data not used for training, and externally validated on the Canadian AppleTree and the Israeli Maccabi Health Services (MHS) data sets. The model's predictive ability was compared with that of a logistic-regression model within each data set.

RESULTS

A cohort of 852 454 individuals with pre-diabetes (glucose ≥ 100 mg/dL and/or HbA1c ≥ 5.7) was used for model training including 4.9 million time points using 900 features. The full model was eventually implemented using 69 variables, generated from 11 basic signals. The machine-learning model demonstrated superiority over the logistic-regression model, which was maintained at all sensitivity levels - comparing AUC [95% CI] between the models; in the THIN data set (0.865 [0.860,0.869] vs 0.778 [0.773,0.784] P < .05), the AppleTree data set (0.907 [0.896, 0.919] vs 0.880 [0.867, 0.894] P < .05) and the MHS data set (0.925 [0.923, 0.927] vs 0.876 [0.872, 0.879] P < .05).

CONCLUSIONS

Machine-learning models preserve their performance across populations in diabetes prediction, and can be integrated into large clinical systems, leading to judicious selection of persons for interventional programmes.

摘要

目的

预先识别出那些有可能从糖尿病前期进展为糖尿病的高危人群，可能有助于针对高危人群实施干预计划，同时避免在低危人群中进行预防和治疗。我们研究了利用电子病历中的患者数据，使用机器学习模型是否可以提高预测糖尿病发病的能力。

方法

使用梯度提升树模型开发了一种预测糖尿病前期向糖尿病进展的机器学习模型。该模型在来自健康改善网络（THIN）数据库队列的数据上进行训练，在未用于训练的 THIN 数据上进行内部验证，并在加拿大 AppleTree 和以色列 Maccabi 健康服务（MHS）数据集上进行外部验证。在每个数据集内，将模型的预测能力与逻辑回归模型进行比较。

结果

使用 852454 名患有糖尿病前期（血糖≥100mg/dL 和/或 HbA1c≥5.7）的个体的队列进行模型训练，共使用了 900 个特征，包含 490 万个时间点。最终，该模型使用 69 个变量实施，这些变量由 11 个基本信号生成。机器学习模型优于逻辑回归模型，在所有灵敏度水平上均保持优势-比较模型之间的 AUC[95%CI]；在 THIN 数据集（0.865[0.860,0.869] vs 0.778[0.773,0.784] P <0.05）、AppleTree 数据集（0.907[0.896, 0.919] vs 0.880[0.867, 0.894] P <0.05）和 MHS 数据集（0.925[0.923, 0.927] vs 0.876[0.872, 0.879] P <0.05）。