Zhang Tiejun, Zhang Di
GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, Guangdong 511436, China.
School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China.
Oncotarget. 2017 Jul 22;8(35):58050-58060. doi: 10.18632/oncotarget.19481. eCollection 2017 Aug 29.
Although numerous approaches have been proposed to discern driver from passenger, identification of driver genes remains a critical challenge in the cancer genomics field. Driver genes with low mutated frequency tend to be filtered in cancer research. In addition, the accumulation of different omics data necessitates the development of algorithmic frameworks for nominating putative driver genes. In this study, we presented a novel framework to identify driver genes through integrating multi-omics data such as somatic mutation, gene expression, and copy number alterations. We developed a computational approach to detect potential driver genes by virtue of their effect on their neighbors in network. Application to three datasets (head and neck squamous cell carcinoma (HNSC), thyroid carcinoma (THCA) and kidney renal clear cell carcinoma (KIRC)) from The Cancer Genome Atlas (TCGA), by comparing the Precision, Recall and F1 score, our method outperformed DriverNet and MUFFINN in all three datasets. In addition, our method was less affected by protein length compared with DriverNet. Lastly, our method not only identified the known cancer genes but also detected the potential rare drivers ( in THCA, , and in KIRC, and in HNSC).
尽管已经提出了许多方法来区分驱动基因和乘客基因,但在癌症基因组学领域,识别驱动基因仍然是一项关键挑战。突变频率低的驱动基因在癌症研究中往往会被过滤掉。此外,不同组学数据的积累需要开发用于提名潜在驱动基因的算法框架。在本研究中,我们提出了一个通过整合多组学数据(如体细胞突变、基因表达和拷贝数改变)来识别驱动基因的新框架。我们开发了一种计算方法,凭借其对网络中邻居的影响来检测潜在的驱动基因。将其应用于来自癌症基因组图谱(TCGA)的三个数据集(头颈部鳞状细胞癌(HNSC)、甲状腺癌(THCA)和肾透明细胞癌(KIRC)),通过比较精确率、召回率和F1分数,我们的方法在所有三个数据集中均优于DriverNet和MUFFINN。此外,与DriverNet相比,我们的方法受蛋白质长度的影响较小。最后,我们的方法不仅识别出了已知的癌症基因,还检测到了潜在的罕见驱动基因(在THCA中为 ,在KIRC中为 和 ,在HNSC中为 和 )。