Department of Public Health, University of Otago Wellington, Wellington City, Wellington, 6021, New Zealand.
John Curtin School of Medical Research, Australian National University, Canberra City, ACT, 2601, Australia.
BMC Med Inform Decis Mak. 2024 Sep 27;24(1):274. doi: 10.1186/s12911-024-02678-x.
In the age of big data, linked social and administrative health data in combination with machine learning (ML) is being increasingly used to improve prediction in chronic disease, e.g., cardiovascular diseases (CVD). In this study we aimed to apply ML methods on extensive national-level health and social administrative datasets to assess the utility of these for predicting future diabetes complications, including by ethnicity.
Five ML models were used to predict CVD events among all people with known diabetes in the population of New Zealand, utilizing nationwide individual-level administrative data.
The Xgboost ML model had the best predictive power for predicting CVD events three years into the future among the population with diabetes (N = 145,600). The optimization procedure also found limited improvement in prediction by ethnicity (using area under the receiver operating curve, [AUC]). The results indicated no trade-off between model predictive performance and equity gap of prediction by ethnicity (that is improving model prediction and reducing performance gaps by ethnicity can be achieved simultaneously). The list of variables of importance was different among different models/ethnic groups, for example: age, deprivation (neighborhood-level), having had a hospitalization event, and the number of years living with diabetes.
We provide further evidence that ML with administrative health data can be used for meaningful future prediction of health outcomes. As such, it could be utilized to inform health planning and healthcare resource allocation for diabetes management and the prevention of CVD events. Our results may suggest limited scope for developing prediction models by ethnic group and that the major ways to reduce inequitable health outcomes is probably via improved delivery of prevention and management to those groups with diabetes at highest need.
在大数据时代,结合机器学习(ML)的关联社会和行政健康数据正被越来越多地用于改善慢性病(如心血管疾病(CVD))的预测。在这项研究中,我们旨在应用 ML 方法对广泛的国家级健康和社会行政数据集进行评估,以评估这些数据对预测未来糖尿病并发症(包括按族裔)的效用。
使用五种 ML 模型来预测新西兰人群中所有已知糖尿病患者的 CVD 事件,利用全国范围内的个人级行政数据。
Xgboost ML 模型在预测未来三年糖尿病患者(N=145600)发生 CVD 事件方面具有最佳的预测能力。优化过程也发现,通过族裔来改善预测的能力有限(使用接收者操作曲线下面积[AUC])。结果表明,预测模型的预测性能和族裔预测的公平差距之间没有权衡(即可以同时提高模型预测和减少族裔预测的差距)。不同模型/族裔群体之间的重要变量列表不同,例如年龄、贫困(社区水平)、住院事件和患糖尿病的年数。
我们提供了进一步的证据表明,使用行政健康数据的 ML 可用于有意义的未来健康结果预测。因此,它可以用于为糖尿病管理和预防 CVD 事件的健康规划和医疗资源分配提供信息。我们的结果可能表明,通过族裔开发预测模型的范围有限,减少不平等健康结果的主要方法可能是通过改善对最高需求糖尿病患者的预防和管理。