Abegaz Tadesse M, Ahmed Muktar, Sherbeny Fatimah, Diaby Vakaramoko, Chi Hongmei, Ali Askal Ayalew
Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL 32307, USA.
Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia.
Healthcare (Basel). 2023 Apr 15;11(8):1138. doi: 10.3390/healthcare11081138.
There is a paucity of predictive models for uncontrolled diabetes mellitus. The present study applied different machine learning algorithms on multiple patient characteristics to predict uncontrolled diabetes. Patients with diabetes above the age of 18 from the All of Us Research Program were included. Random forest, extreme gradient boost, logistic regression, and weighted ensemble model algorithms were employed. Patients who had a record of uncontrolled diabetes based on the international classification of diseases code were identified as cases. A set of features including basic demographic, biomarkers and hematological indices were included in the model. The random forest model demonstrated high performance in predicting uncontrolled diabetes, yielding an accuracy of 0.80 (95% CI: 0.79-0.81) as compared to the extreme gradient boost 0.74 (95% CI: 0.73-0.75), the logistic regression 0.64 (95% CI: 0.63-0.65) and the weighted ensemble model 0.77 (95% CI: 0.76-0.79). The maximum area under the receiver characteristics curve value was 0.77 (random forest model), while the minimum value was 0.7 (logistic regression model). Potassium levels, body weight, aspartate aminotransferase, height, and heart rate were important predictors of uncontrolled diabetes. The random forest model demonstrated a high performance in predicting uncontrolled diabetes. Serum electrolytes and physical measurements were important features in predicting uncontrolled diabetes. Machine learning techniques may be used to predict uncontrolled diabetes by incorporating these clinical characteristics.
目前缺乏针对未控制的糖尿病的预测模型。本研究对多种患者特征应用了不同的机器学习算法,以预测未控制的糖尿病。纳入了来自“我们所有人研究计划”的18岁以上糖尿病患者。采用了随机森林、极端梯度提升、逻辑回归和加权集成模型算法。根据国际疾病分类代码有未控制糖尿病记录的患者被确定为病例。模型中纳入了一组包括基本人口统计学、生物标志物和血液学指标的特征。随机森林模型在预测未控制的糖尿病方面表现出高性能,准确率为0.80(95%CI:0.79 - 0.81),相比之下,极端梯度提升模型为0.74(95%CI:0.73 - 0.75),逻辑回归模型为0.64(95%CI:0.63 - 0.65),加权集成模型为0.77(95%CI:0.76 - 0.79)。受试者特征曲线下的最大面积值为0.77(随机森林模型),而最小值为0.7(逻辑回归模型)。钾水平、体重、天冬氨酸转氨酶、身高和心率是未控制糖尿病的重要预测因素。随机森林模型在预测未控制的糖尿病方面表现出高性能。血清电解质和身体测量指标是预测未控制糖尿病的重要特征。通过纳入这些临床特征,机器学习技术可用于预测未控制的糖尿病。