Ravaut Mathieu, Sadeghi Hamed, Leung Kin Kwan, Volkovs Maksims, Kornas Kathy, Harish Vinyas, Watson Tristan, Lewis Gary F, Weisman Alanna, Poutanen Tomi, Rosella Laura
Layer 6 AI, Toronto, ON, Canada.
Department of Computer Science, University of Toronto, Toronto, ON, Canada.
NPJ Digit Med. 2021 Feb 12;4(1):24. doi: 10.1038/s41746-021-00394-8.
Across jurisdictions, government and health insurance providers hold a large amount of data from patient interactions with the healthcare system. We aimed to develop a machine learning-based model for predicting adverse outcomes due to diabetes complications using administrative health data from the single-payer health system in Ontario, Canada. A Gradient Boosting Decision Tree model was trained on data from 1,029,366 patients, validated on 272,864 patients, and tested on 265,406 patients. Discrimination was assessed using the AUC statistic and calibration was assessed visually using calibration plots overall and across population subgroups. Our model predicting three-year risk of adverse outcomes due to diabetes complications (hyper/hypoglycemia, tissue infection, retinopathy, cardiovascular events, amputation) included 700 features from multiple diverse data sources and had strong discrimination (average test AUC = 77.7, range 77.7-77.9). Through the design and validation of a high-performance model to predict diabetes complications adverse outcomes at the population level, we demonstrate the potential of machine learning and administrative health data to inform health planning and healthcare resource allocation for diabetes management.
在不同司法管辖区,政府和医疗保险提供商掌握着大量患者与医疗系统互动的数据。我们旨在利用加拿大安大略省单一支付者医疗系统的行政健康数据,开发一种基于机器学习的模型,用于预测糖尿病并发症导致的不良后果。我们使用来自1,029,366名患者的数据训练了梯度提升决策树模型,在272,864名患者上进行了验证,并在265,406名患者上进行了测试。使用AUC统计量评估辨别力,并使用校准图在总体和各人群亚组中直观评估校准情况。我们的模型用于预测糖尿病并发症(高/低血糖、组织感染、视网膜病变、心血管事件、截肢)导致的三年不良后果风险,该模型包含来自多个不同数据源的700个特征,具有很强的辨别力(平均测试AUC = 77.7,范围77.7 - 77.9)。通过设计和验证一个在人群层面预测糖尿病并发症不良后果的高性能模型,我们展示了机器学习和行政健康数据在为糖尿病管理提供健康规划和医疗资源分配信息方面的潜力。