Boumait Youssra, Ettetuani Boutaina, Chrairi Manal, Lamzouri Afaf, Chahboune Rajaa
Biology Molecular Unit, Life and Health Sciences Laboratory, Faculty of Medicine and Pharmacy, Abdelmalek Essaâdi University, Tangier 93000, Morocco.
Systems and Data Engineering Team, National School of Applied Sciences, Abdelmalek Essaâdi University, Tangier 93000, Morocco.
Genes (Basel). 2025 Jun 18;16(6):715. doi: 10.3390/genes16060715.
Latent tuberculosis infection (LTBi) affects nearly a quarter of the global population, yet current diagnostic methods are limited by low sensitivity and specificity. This study applied an integrative bioinformatics framework, incorporating machine learning techniques, to identify robust gene expression biomarkers associated with LTBi. We analyzed four publicly available transcriptomic datasets from peripheral blood mononuclear cells (PBMCs), representing latent, active, and healthy states. Differentially expressed genes (DEGs) were identified, followed by gene ontology (GO) enrichment, functional clustering, and miRNA interaction analysis. Semantic similarity, unsupervised clustering, and pathway enrichment were applied to refine the gene list. Key biomarkers were prioritized using receiver operating characteristic (ROC) curve analysis, with CCL2 and CXCL10 emerging as top candidates (AUC > 0.85). This multi-step approach demonstrates the potential of combining transcriptomic profiling with established machine learning and bioinformatics tools to uncover candidate biomarkers for improved LTBi detection, and it also provides a foundation for future experimental validation.
潜伏性结核感染(LTBi)影响着全球近四分之一的人口,但目前的诊断方法受到低敏感性和特异性的限制。本研究应用了一个整合生物信息学框架,结合机器学习技术,以识别与LTBi相关的可靠基因表达生物标志物。我们分析了来自外周血单核细胞(PBMC)的四个公开可用的转录组数据集,代表潜伏、活跃和健康状态。识别出差异表达基因(DEG),随后进行基因本体(GO)富集、功能聚类和miRNA相互作用分析。应用语义相似性、无监督聚类和通路富集来优化基因列表。使用受试者工作特征(ROC)曲线分析对关键生物标志物进行排序,CCL2和CXCL10成为顶级候选物(AUC>0.85)。这种多步骤方法证明了将转录组分析与既定的机器学习和生物信息学工具相结合以发现用于改进LTBi检测的候选生物标志物的潜力,并且它还为未来的实验验证提供了基础。