Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China.
Bioinformatics. 2022 Oct 31;38(21):4901-4907. doi: 10.1093/bioinformatics/btac622.
Identifying genes that play a causal role in cancer evolution remains one of the biggest challenges in cancer biology. With the accumulation of high-throughput multi-omics data over decades, it becomes a great challenge to effectively integrate these data into the identification of cancer driver genes.
Here, we propose MODIG, a graph attention network (GAT)-based framework to identify cancer driver genes by combining multi-omics pan-cancer data (mutations, copy number variants, gene expression and methylation levels) with multi-dimensional gene networks. First, we established diverse types of gene relationship maps based on protein-protein interactions, gene sequence similarity, KEGG pathway co-occurrence, gene co-expression patterns and gene ontology. Then, we constructed a multi-dimensional gene network consisting of approximately 20 000 genes as nodes and five types of gene associations as multiplex edges. We applied a GAT to model within-dimension interactions to generate a gene representation for each dimension based on this graph. Moreover, we introduced a joint learning module to fuse multiple dimension-specific representations to generate general gene representations. Finally, we used the obtained gene representation to perform a semi-supervised driver gene identification task. The experiment results show that MODIG outperforms the baseline models in terms of area under precision-recall curves and area under the receiver operating characteristic curves.
The MODIG program is available at https://github.com/zjupgx/modig. The code and data underlying this article are also available on Zenodo, at https://doi.org/10.5281/zenodo.7057241.
Supplementary data are available at Bioinformatics online.
确定在癌症进化中起因果作用的基因仍然是癌症生物学中最大的挑战之一。随着数十年来高通量多组学数据的积累,有效地将这些数据整合到癌症驱动基因的识别中是一个巨大的挑战。
在这里,我们提出了 MODIG,这是一种基于图注意力网络(GAT)的框架,通过结合多组学泛癌数据(突变、拷贝数变异、基因表达和甲基化水平)与多维基因网络来识别癌症驱动基因。首先,我们基于蛋白质-蛋白质相互作用、基因序列相似性、KEGG 途径共现、基因共表达模式和基因本体论建立了多种类型的基因关系图谱。然后,我们构建了一个由大约 20000 个基因作为节点和 5 种基因关联作为多重边组成的多维基因网络。我们应用 GAT 来对维度内的相互作用进行建模,以基于该图为每个维度生成一个基因表示。此外,我们引入了一个联合学习模块来融合多个维度特有的表示,以生成通用的基因表示。最后,我们使用获得的基因表示来执行半监督驱动基因识别任务。实验结果表明,MODIG 在精度-召回曲线下面积和接收者操作特征曲线下面积方面优于基线模型。
MODIG 程序可在 https://github.com/zjupgx/modig 获得。本文的代码和数据也可在 Zenodo 上获得,网址为 https://doi.org/10.5281/zenodo.7057241。
补充数据可在生物信息学在线获得。