Zhu Qingqing, Liu Jie
Anhui Provincial Tuberculosis Institute, Hefei, Anhui, China.
Front Genet. 2023 Mar 9;14:1094099. doi: 10.3389/fgene.2023.1094099. eCollection 2023.
Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. The present study aimed to identify specific biomarkers of PTB and construct a model for diagnosing PTB by combining random forest (RF) and artificial neural network (ANN) algorithms. Two publicly available cohorts of TB, namely, the GSE83456 (training) and GSE42834 (validation) cohorts, were retrieved from the Gene Expression Omnibus (GEO) database. A total of 45 and 61 differentially expressed genes (DEGs) were identified between the PTB and control samples, respectively, by screening the GSE83456 cohort. An RF classifier was used for identifying specific biomarkers, following which an ANN-based classification model was constructed for identifying PTB samples. The accuracy of the ANN model was validated using the receiver operating characteristic (ROC) curve. The proportion of 22 types of immunocytes in the PTB samples was measured using the CIBERSORT algorithm, and the correlations between the immunocytes were determined. Differential analysis revealed that 11 and 22 DEGs were upregulated and downregulated, respectively, and 11 biomarkers specific to PTB were identified by the RF classifier. The weights of these biomarkers were determined and an ANN-based classification model was subsequently constructed. The model exhibited outstanding performance, as revealed by the area under the curve (AUC), which was 1.000 for the training cohort. The AUC of the validation cohort was 0.946, which further confirmed the accuracy of the model. Altogether, the present study successfully identified specific genetic biomarkers of PTB and constructed a highly accurate model for the diagnosis of PTB based on blood samples. The model developed herein can serve as a reliable reference for the early detection of PTB and provide novel perspectives into the pathogenesis of PTB.
肺结核(PTB)是一种慢性传染病,也是最常见的结核病类型。尽管痰涂片检查是诊断PTB的金标准,但该方法存在诸多局限性,包括敏感性低、特异性低和样本不足。本研究旨在识别PTB的特异性生物标志物,并通过结合随机森林(RF)和人工神经网络(ANN)算法构建PTB诊断模型。从基因表达综合数据库(GEO)中检索了两个公开可用的结核病队列,即GSE83456(训练)和GSE42834(验证)队列。通过筛选GSE83456队列,分别在PTB样本和对照样本之间鉴定出总共45个和61个差异表达基因(DEG)。使用RF分类器识别特异性生物标志物,随后构建基于ANN的分类模型以识别PTB样本。使用受试者工作特征(ROC)曲线验证ANN模型的准确性。使用CIBERSORT算法测量PTB样本中22种免疫细胞的比例,并确定免疫细胞之间的相关性。差异分析显示,分别有11个和22个DEG上调和下调,RF分类器鉴定出11个PTB特异性生物标志物。确定这些生物标志物的权重,随后构建基于ANN的分类模型。该模型表现出出色的性能,训练队列的曲线下面积(AUC)为1.000。验证队列的AUC为0.946,进一步证实了模型的准确性。总之,本研究成功识别了PTB的特异性遗传生物标志物,并构建了基于血液样本的高度准确的PTB诊断模型。本文开发的模型可为PTB的早期检测提供可靠参考,并为PTB的发病机制提供新的视角。