Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Road 1095, Wuhan, 430030, China.
Department of Immunology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Hangkong Road 13, Wuhan, China.
BMC Infect Dis. 2022 Dec 29;22(1):965. doi: 10.1186/s12879-022-07954-7.
The discrimination between active tuberculosis (ATB) and latent tuberculosis infection (LTBI) remains challenging. The present study aims to investigate the value of diagnostic models established by machine learning based on multiple laboratory data for distinguishing Mycobacterium tuberculosis (Mtb) infection status.
T-SPOT, lymphocyte characteristic detection, and routine laboratory tests were performed on participants. Diagnostic models were built according to various algorithms.
A total of 892 participants (468 ATB and 424 LTBI) and another 263 participants (125 ATB and 138 LTBI), were respectively enrolled at Tongji Hospital (discovery cohort) and Sino-French New City Hospital (validation cohort). Receiver operating characteristic (ROC) curve analysis showed that the value of individual indicator for differentiating ATB from LTBI was limited (area under the ROC curve (AUC) < 0.8). A total of 28 models were successfully established using machine learning. Among them, the AUCs of 25 models were more than 0.9 in test set. It was found that conditional random forests (cforest) model, based on the implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners, presented best discriminative power in segregating ATB from LTBI. Specially, cforest model presented an AUC of 0.978, with the sensitivity of 93.39% and the specificity of 91.18%. Mtb-specific response represented by early secreted antigenic target 6 (ESAT-6) and culture filtrate protein 10 (CFP-10) spot-forming cell (SFC) in T-SPOT assay, as well as global adaptive immunity assessed by CD4 cell IFN-γ secretion, CD8 cell IFN-γ secretion, and CD4 cell number, were found to contribute greatly to the cforest model. Superior performance obtained in the discovery cohort was further confirmed in the validation cohort. The sensitivity and specificity of cforest model in validation set were 92.80% and 89.86%, respectively.
Cforest model developed upon machine learning could serve as a valuable and prospective tool for identifying Mtb infection status. The present study provided a novel and viable idea for realizing the clinical diagnostic application of the combination of machine learning and laboratory findings.
活动性结核病(ATB)和潜伏性结核感染(LTBI)的鉴别仍然具有挑战性。本研究旨在探讨基于多种实验室数据的机器学习建立的诊断模型在区分结核分枝杆菌(Mtb)感染状态方面的价值。
对参与者进行 T-SPOT、淋巴细胞特征检测和常规实验室检查。根据不同的算法建立诊断模型。
共纳入 892 名参与者(468 例 ATB 和 424 例 LTBI)和另外 263 名参与者(125 例 ATB 和 138 例 LTBI),分别来自同济大学附属同济医院(发现队列)和中法新城院区(验证队列)。受试者工作特征(ROC)曲线分析显示,单个指标区分 ATB 和 LTBI 的价值有限(ROC 曲线下面积(AUC)<0.8)。使用机器学习成功建立了 28 个模型。其中,25 个模型在测试集中的 AUC 均大于 0.9。结果发现,基于随机森林和装袋集成算法实现,利用条件推断树作为基学习器的条件随机森林(cforest)模型在区分 ATB 和 LTBI 方面具有最佳判别能力。特别是,cforest 模型的 AUC 为 0.978,灵敏度为 93.39%,特异性为 91.18%。T-SPOT 检测中 ESAT-6 和 CFP-10 斑点形成细胞(SFC)代表的 Mtb 特异性反应,以及 CD4 细胞 IFN-γ 分泌、CD8 细胞 IFN-γ 分泌和 CD4 细胞数量评估的整体适应性免疫,对 cforest 模型贡献很大。在发现队列中获得的优异性能在验证队列中得到了进一步证实。cforest 模型在验证集中的灵敏度和特异性分别为 92.80%和 89.86%。
基于机器学习建立的 cforest 模型可作为一种有价值的、有前景的工具,用于识别 Mtb 感染状态。本研究为实现机器学习与实验室发现相结合的临床诊断应用提供了新的可行思路。