Phat Nguyen Ky, Lee Yoonah, Vu Dinh Hoa, Long Nguyen Phuoc, Park Seongoh
Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea.
Department of Statistics, Sungshin Women's University, Seoul, 02844, Republic of Korea.
BMC Med Inform Decis Mak. 2025 Aug 11;25(1):301. doi: 10.1186/s12911-025-03139-9.
Understanding early predictors of treatment outcomes allows better outcome prediction and resource allocation for efficient tuberculosis (TB) management.
This study aimed to predict treatment outcomes of TB patients from a real-world population-wide health record dataset with a significant rate of incomplete observations. In addition, potential risk factors associated with death during TB treatment were investigated.
We exploited the upweighting approach and multiple imputation analysis (MIA) to address the extreme imbalance in responses and missing data. Three algorithms were employed for TB treatment outcome prediction, including logistic regression (LOGIT), random forest, and stochastic gradient boosting. The three models exhibited similar performance in predicting the treatment outcomes. Moreover, an interpretation of LOGIT was conducted, adjusted odds ratios (aORs) were computed, and the interpretation results were compared between MIA and complete case analysis (CCA).
MIA was an appropriate method for coping with missing data. In addition, compared to CCA, the interpretation results of the MIA-derived LOGIT showed more statistically significant covariates associated with TB treatment outcomes. In MIA, factors such as TB clinical form involving both pulmonary TB and extrapulmonary TB [aOR = 3.077, 95% confidence interval (CI) = 2.994-3.163], retreatment after abandonment (aOR = 2.272, 95% CI = 2.209-2.338), and the absence of isoniazid (aOR = 2.072, 95% CI = 1.892-2.269) or rifampicin (aOR = 1.968, 95% CI = 1.746-2.218) in the treatment regimen were associated with increased odds of death.
In conclusion, our results shed light on the potential risk factors for death during TB treatment and suggest the use of simple yet interpretable LOGIT for the prediction of TB treatment outcomes.
了解治疗结果的早期预测因素有助于更好地预测结果并分配资源,以实现高效的结核病管理。
本研究旨在利用一个存在大量不完整观察值的真实世界全人群健康记录数据集,预测结核病患者的治疗结果。此外,还对结核病治疗期间与死亡相关的潜在风险因素进行了调查。
我们采用加权法和多重插补分析(MIA)来解决反应极端不平衡和数据缺失的问题。使用了三种算法进行结核病治疗结果预测,包括逻辑回归(LOGIT)、随机森林和随机梯度提升。这三种模型在预测治疗结果方面表现出相似的性能。此外,对LOGIT进行了解释,计算了调整后的优势比(aOR),并比较了MIA和完全病例分析(CCA)的解释结果。
MIA是处理缺失数据的合适方法。此外,与CCA相比,MIA得出的LOGIT的解释结果显示,与结核病治疗结果相关的协变量在统计学上更显著。在MIA中,涉及肺结核和肺外结核的结核病临床类型(aOR = 3.077,95%置信区间[CI] = 2.994 - 3.163)、放弃治疗后的复治(aOR = 2.272,95% CI = 2.209 - 2.338)以及治疗方案中没有异烟肼(aOR = 2.072,95% CI = 1.892 - 2.269)或利福平(aOR = 1.968,95% CI = 1.746 - 2.218)等因素与死亡几率增加相关。
总之,我们的结果揭示了结核病治疗期间死亡的潜在风险因素,并建议使用简单且可解释的LOGIT来预测结核病治疗结果。