School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
BMC Genomics. 2023 Jul 29;24(1):426. doi: 10.1186/s12864-023-09515-x.
Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.
综合分析多个数据集可以识别各种癌症的潜在驱动基因。近年来,基于大量突变数据和基因相互作用网络的驱动基因发现引起了越来越多的关注,但仍需要探索结合蛋白质相互作用网络中基因的功能和结构信息来识别驱动基因。因此,我们提出了一种结合功能和结构信息的网络嵌入框架来识别驱动基因。首先,我们结合突变数据和基因相互作用网络,使用网络传播算法构建突变整合网络。其次,使用 struc2vec 模型从突变整合网络中提取基因特征,其中包含基因的功能和结构信息。最后,利用机器学习算法识别驱动基因。与之前的四种优秀方法相比,我们的方法可以通过结构相似性找到彼此相距较远的基因对,并且在识别癌症基因组图谱中的 12 种癌症的驱动基因方面具有更好的性能。同时,我们还对三种基因相互作用网络、三种基因标准集和五种机器学习算法进行了对比分析。我们的框架为识别新的驱动基因提供了一个新的特征选择视角。