Xie Rongjun, Liu Longfei, Lu Xianzhou, He Chengjian, Li Guoxin
Department of General Surgery, Nanfang Hospital, The First School of Clinical Medicine, Southern Medical University, Guangzhou, China.
Department of General Surgery, Affiliated Nanhua Hospital, Hengyang Medical School, University of South China, Hengyang, China.
Front Genet. 2023 Jan 4;13:1067524. doi: 10.3389/fgene.2022.1067524. eCollection 2022.
Finding reliable diagnostic markers for gastric cancer (GC) is important. This work uses machine learning (ML) to identify GC diagnostic genes and investigate their connection with immune cell infiltration. We downloaded eight GC-related datasets from GEO, TCGA, and GTEx. GSE13911, GSE15459, GSE19826, GSE54129, and GSE79973 were used as the training set, GSE66229 as the validation set A, and TCGA & GTEx as the validation set B. First, the training set screened differentially expressed genes (DEGs), and gene ontology (GO), kyoto encyclopedia of genes and genomes (KEGG), disease Ontology (DO), and gene set enrichment analysis (GSEA) analyses were performed. Then, the candidate diagnostic genes were screened by LASSO and SVM-RFE algorithms, and receiver operating characteristic (ROC) curves evaluated the diagnostic efficacy. Then, the infiltration characteristics of immune cells in GC samples were analyzed by CIBERSORT, and correlation analysis was performed. Finally, mutation and survival analyses were performed for diagnostic genes. We found 207 up-regulated genes and 349 down-regulated genes among 556 DEGs. gene ontology analysis significantly enriched 413 functional annotations, including 310 biological processes, 23 cellular components, and 80 molecular functions. Six of these biological processes are closely related to immunity. KEGG analysis significantly enriched 11 signaling pathways. 244 diseases were closely related to Ontology analysis. Multiple entries of the gene set enrichment analysis analysis were closely related to immunity. Machine learning screened eight candidate diagnostic genes and further validated them to identify , , , , , and as diagnostic genes. Six diagnostic genes were mutated to some extent in GC. , , , , had prognostic value. We screened six diagnostic genes for gastric cancer through bioinformatic analysis and machine learning, which are intimately related to immune cell infiltration and have a definite prognostic value.
寻找可靠的胃癌诊断标志物至关重要。本研究利用机器学习来识别胃癌诊断基因,并探究其与免疫细胞浸润的关联。我们从GEO、TCGA和GTEx下载了八个与胃癌相关的数据集。将GSE13911、GSE15459、GSE19826、GSE54129和GSE79973用作训练集,GSE66229用作验证集A,TCGA和GTEx用作验证集B。首先,训练集筛选差异表达基因(DEGs),并进行基因本体(GO)、京都基因与基因组百科全书(KEGG)、疾病本体(DO)和基因集富集分析(GSEA)。然后,通过LASSO和SVM - RFE算法筛选候选诊断基因,并通过受试者工作特征(ROC)曲线评估诊断效能。接着,利用CIBERSORT分析胃癌样本中免疫细胞的浸润特征,并进行相关性分析。最后,对诊断基因进行突变和生存分析。我们在556个差异表达基因中发现了207个上调基因和349个下调基因。基因本体分析显著富集了4份13个功能注释,包括310个生物学过程、23个细胞成分和80个分子功能。其中6个生物学过程与免疫密切相关。KEGG分析显著富集了11条信号通路。244种疾病与本体分析密切相关。基因集富集分析的多个条目与免疫密切相关。机器学习筛选出8个候选诊断基因,并进一步验证,确定[具体基因名称1]、[具体基因名称2]、[具体基因名称3]、[具体基因名称4]、[具体基因名称5]和[具体基因名称6]为诊断基因。6个诊断基因在胃癌中存在一定程度的突变。[具体基因名称1]、[具体基因名称2]、[具体基因名称3]、[具体基因名称4]、[具体基因名称5]具有预后价值。我们通过生物信息学分析和机器学习筛选出6个胃癌诊断基因,它们与免疫细胞浸润密切相关,且具有明确的预后价值。