IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2231-2240. doi: 10.1109/TCBB.2021.3063532. Epub 2022 Aug 8.
With the advances in gene sequencing technologies, millions of somatic mutations have been reported in the past decades, but mining cancer driver genes with oncogenic mutations from these data remains a critical and challenging area of research. In this study, we proposed a network-based classification method for identifying cancer driver genes with merging the multi-biological information. In this method, we construct a cancer specific genetic network from the human protein-protein interactome (PPI) to mine the network structure attributes, and combine biological information such as mutation frequency and differential expression of genes to achieve accurate prediction of cancer driver genes. Across seven different cancer types, the proposed algorithm always achieves high prediction accuracy, which is superior to the existing advanced methods. In the analysis of the predicted results, about 40 percent of the top 10 candidate genes overlap with the Cancer Gene Census database. Interestingly, the feature comparison indicates that the network based features are still more important than the biological features, including the mutation frequency and genetic differential expression. Further analyses also show that the integration of network structure attributes and biological information is valuable for predicting new cancer driver genes.
随着基因测序技术的进步,过去几十年已经报道了数百万个体细胞突变,但从这些数据中挖掘具有致癌突变的癌症驱动基因仍然是一个关键且具有挑战性的研究领域。在这项研究中,我们提出了一种基于网络的分类方法,用于从多生物学信息中识别具有致癌突变的癌症驱动基因。在这种方法中,我们从人类蛋白质-蛋白质相互作用网络(PPI)构建一个癌症特异性遗传网络,以挖掘网络结构属性,并结合基因的突变频率和差异表达等生物学信息,实现对癌症驱动基因的准确预测。在七种不同的癌症类型中,所提出的算法始终实现了高预测准确性,优于现有的先进方法。在对预测结果的分析中,约 40%的前 10 个候选基因与癌症基因普查数据库重叠。有趣的是,特征比较表明,网络特征比包括突变频率和遗传差异表达在内的生物学特征更为重要。进一步的分析还表明,网络结构属性和生物学信息的整合对于预测新的癌症驱动基因是有价值的。