Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center, Chongqing, People's Republic of China.
Department of Endocrinology, First Affiliated Hospital of Anhui Medical University, Hefei, 230022, Anhui, People's Republic of China.
Sci Rep. 2024 Mar 21;14(1):6814. doi: 10.1038/s41598-024-57446-8.
The present study aims to assess the treatment outcome of patients with diabetes and tuberculosis (TB-DM) at an early stage using machine learning (ML) based on electronic medical records (EMRs). A total of 429 patients were included at Chongqing Public Health Medical Center. The random-forest-based Boruta algorithm was employed to select the essential variables, and four models with a fivefold cross-validation scheme were used for modeling and model evaluation. Furthermore, we adopted SHapley additive explanations to interpret results from the tree-based model. 9 features out of 69 candidate features were chosen as predictors. Among these predictors, the type of resistance was the most important feature, followed by activated partial throm-boplastic time (APTT), thrombin time (TT), platelet distribution width (PDW), and prothrombin time (PT). All the models we established performed above an AUC 0.7 with good predictive performance. XGBoost, the optimal performing model, predicts the risk of treatment failure in the test set with an AUC 0.9281. This study suggests that machine learning approach (XGBoost) presented in this study identifies patients with TB-DM at higher risk of treatment failure at an early stage based on EMRs. The application of a convenient and economy EMRs based on machine learning provides new insight into TB-DM treatment strategies in low and middle-income countries.
本研究旨在使用基于电子病历(EMR)的机器学习(ML)评估糖尿病和结核病(TB-DM)患者的早期治疗结果。共有 429 名患者在重庆公共卫生医疗中心接受了治疗。采用基于随机森林的 Boruta 算法选择必要变量,并使用 5 折交叉验证方案构建了 4 个模型进行建模和模型评估。此外,我们采用 SHapley 加法解释来解释基于树的模型的结果。从 69 个候选特征中选择了 9 个特征作为预测因子。在这些预测因子中,耐药类型是最重要的特征,其次是活化部分凝血活酶时间(APTT)、凝血酶时间(TT)、血小板分布宽度(PDW)和凝血酶原时间(PT)。我们建立的所有模型的 AUC 均高于 0.7,具有良好的预测性能。表现最佳的 XGBoost 模型在测试集中预测治疗失败的风险,AUC 为 0.9281。本研究表明,本研究提出的机器学习方法(XGBoost)可根据 EMR 识别出早期 TB-DM 患者治疗失败的高风险。基于机器学习的便捷、经济的 EMRs 的应用为中低收入国家的 TB-DM 治疗策略提供了新的思路。