Department of Laboratory Medicine, Daping Hospital, Army Medical University, Chongqing, 400042, P.R. China.
Science in Computational Finance, Carnegie Mellon University, Pittsburgh, PA, USA.
BMC Infect Dis. 2023 Dec 16;23(1):881. doi: 10.1186/s12879-023-08531-2.
Tuberculosis is a chronic infectious disease caused by mycobacterium tuberculosis (MTB) and is the ninth leading cause of death worldwide. It is still difficult to distinguish active TB from latent TB,but it is very important for individualized management and treatment to distinguish whether patients are active or latent tuberculosis infection.
A total of 220 subjects, including active TB patients (ATB, n = 97) and latent TB patients (LTB, n = 113), were recruited in this study .46 features about blood routine indicators and the VCS parameters (volume, conductivity, light scatter) of neutrophils(NE), monocytes(MO), and lymphocytes(LY) were collected and was constructed classification model by four machine learning algorithms(logistic regression(LR), random forest(RF), support vector machine(SVM) and k-nearest neighbor(KNN)). And the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) to estimate of the model's predictive performance for dentifying active and latent tuberculosis infection.
After verification,among the four classifications, LR and RF had the best performance (AUROC = 1, AUPRC = 1), followed by SVM (AUROC = 0.967, AUPRC = 0.971), KNN (AUROC = 0.943, AUPRC = 0.959) in the training set. And LR had the best performance (AUROC = 0.977, AUPRC = 0.957), followed by SVM (AUROC = 0.962, AUPRC = 0.949), RF (AUROC = 0.903, AUPRC = 0.922),KNN(AUROC = 0.883, AUPRC = 0.901) in the testing set.
The machine learning algorithm classifier based on leukocyte VCS parameters is of great value in identifying active and latent tuberculosis infection.
结核病是由结核分枝杆菌(MTB)引起的慢性传染病,是全球第九大死亡原因。区分活动性结核病和潜伏性结核病仍然很困难,但区分患者是活动性结核病还是潜伏性结核感染对于个体化管理和治疗非常重要。
本研究共纳入 220 例受试者,包括活动性结核病患者(ATB,n=97)和潜伏性结核病患者(LTB,n=113)。收集了有关血常规指标和中性粒细胞(NE)、单核细胞(MO)和淋巴细胞(LY)的 VCS 参数(体积、传导率、光散射)的 46 个特征,并通过四种机器学习算法(逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)和 k-最近邻(KNN))构建分类模型。使用精度-召回曲线下面积(AUPRC)和受试者工作特征曲线下面积(AUROC)来评估模型对鉴别活动性和潜伏性结核病感染的预测性能。
经过验证,在这四种分类方法中,LR 和 RF 的性能最佳(AUROC=1,AUPRC=1),其次是 SVM(AUROC=0.967,AUPRC=0.971),KNN(AUROC=0.943,AUPRC=0.959)在训练集中。LR 的性能最佳(AUROC=0.977,AUPRC=0.957),其次是 SVM(AUROC=0.962,AUPRC=0.949),RF(AUROC=0.903,AUPRC=0.922),KNN(AUROC=0.883,AUPRC=0.901)在测试集中。
基于白细胞 VCS 参数的机器学习算法分类器在识别活动性和潜伏性结核病感染方面具有重要价值。