Wen Jian-Xin, Li Xiao-Qin, Chang Yu
College of Life Science and Bioengineering, Beijing University of Technology , Beijing, P.R. China .
J Comput Biol. 2018 Aug;25(8):907-916. doi: 10.1089/cmb.2017.0261. Epub 2018 Jun 29.
To identify signature genes for the pathogenesis of cancer, which provides a theoretical support for prevention and early diagnosis of cancer. The pattern recognition method was used to analyze the genome-wide gene expression data, which was collected from the The Cancer Genome Atlas (TCGA) database. For the transcription of invasive breast carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, colon adenocarcinoma, renal clear-cell carcinoma, thyroid carcinoma, and hepatocellular carcinoma of the seven cancers, the signature genes were selected by means of a combination of statistical methods, such as correlation, t-test, confidence interval, etc. Modeling by artificial neural network model, the accuracy can be as high as 98% for the TCGA data and as high as 92% for the Gene Expression Omnibus (GEO) independent data, the recognition accuracy of stage I is more than 95%, which is higher compared with the previous study. The common genes emerging in five cancers were obtained from the signature genes of seven cancers, PID1, and SPTBN2. At the same time, we obtain three common pathways of cancer by using Kyoto Encyclopedia of Genes and Genomes' pathway analysis. A functional analysis of the pathways shows their close relationship at the level of gene regulation, which indicted that the identified signature genes play an important role in the pathogenesis of cancer and is very important for understanding the pathogenesis of cancer and the early diagnosis.
为了识别癌症发病机制的特征基因,为癌症的预防和早期诊断提供理论支持。采用模式识别方法分析从癌症基因组图谱(TCGA)数据库收集的全基因组基因表达数据。对于浸润性乳腺癌、肺腺癌、肺鳞状细胞癌、结肠腺癌、肾透明细胞癌、甲状腺癌和肝细胞癌这七种癌症的转录情况,通过相关性、t检验、置信区间等统计方法相结合的方式选择特征基因。利用人工神经网络模型进行建模,对于TCGA数据准确率高达98%,对于基因表达综合数据库(GEO)独立数据准确率高达92%,I期的识别准确率超过95%,与先前研究相比更高。从七种癌症的特征基因中获得了在五种癌症中出现的共同基因,即PID1和SPTBN2。同时,通过使用京都基因与基因组百科全书的通路分析获得了三条癌症共同通路。对这些通路的功能分析表明它们在基因调控水平上密切相关,这表明所识别的特征基因在癌症发病机制中起重要作用,对于理解癌症发病机制和早期诊断非常重要。