West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
Medical Big Data Center, Sichuan University, Chengdu, China.
World J Surg Oncol. 2018 Nov 14;16(1):223. doi: 10.1186/s12957-018-1519-y.
Pancreatic cancer is one of the most lethal tumors with poor prognosis, and lacks of effective biomarkers in diagnosis and treatment. The aim of this investigation was to identify hub genes in pancreatic cancer, which would serve as potential biomarkers for cancer diagnosis and therapy in the future.
Combination of two expression profiles of GSE16515 and GSE22780 from Gene Expression Omnibus (GEO) database was served as training set. Differentially expressed genes (DEGs) with top 25% variance followed by protein-protein interaction (PPI) network were performed to find candidate genes. Then, hub genes were further screened by survival and cox analyses in The Cancer Genome Atlas (TCGA) database. Finally, hub genes were validated in GSE15471 dataset from GEO by supervised learning methods k-nearest neighbor (kNN) and random forest algorithms.
After quality control and batch effect elimination of training set, 181 DEGs bearing top 25% variance were identified as candidate genes. Then, two hub genes, MMP7 and ITGA2, correlating with diagnosis and prognosis of pancreatic cancer were screened as hub genes according to above-mentioned bioinformatics methods. Finally, hub genes were demonstrated to successfully differ tumor samples from normal tissues with predictive accuracies reached to 93.59 and 81.31% by using kNN and random forest algorithms, respectively.
All the hub genes were associated with the regulation of tumor microenvironment, which implicated in tumor proliferation, progression, migration, and metastasis. Our results provide a novel prospect for diagnosis and treatment of pancreatic cancer, which may have a further application in clinical.
胰腺癌是预后最差的致命肿瘤之一,在诊断和治疗方面缺乏有效的生物标志物。本研究旨在鉴定胰腺癌中的枢纽基因,这些基因将成为未来癌症诊断和治疗的潜在生物标志物。
将基因表达综合数据库(GEO)中的 GSE16515 和 GSE22780 两个表达谱组合作为训练集。通过差异表达基因(DEGs)和蛋白质-蛋白质相互作用(PPI)网络,筛选前 25%变异的候选基因。然后,通过癌症基因组图谱(TCGA)数据库的生存和 COX 分析进一步筛选枢纽基因。最后,通过监督学习方法 k-最近邻(kNN)和随机森林算法在 GEO 的 GSE15471 数据集验证枢纽基因。
经过训练集的质量控制和批次效应消除后,确定了 181 个具有前 25%变异的 DEGs 作为候选基因。然后,根据上述生物信息学方法,筛选出与胰腺癌诊断和预后相关的两个枢纽基因 MMP7 和 ITGA2。最后,使用 kNN 和随机森林算法,枢纽基因成功区分肿瘤样本和正常组织,预测准确率分别达到 93.59%和 81.31%。
所有的枢纽基因都与肿瘤微环境的调节有关,参与肿瘤的增殖、进展、迁移和转移。我们的研究结果为胰腺癌的诊断和治疗提供了新的前景,可能在临床应用中有进一步的应用。