LIX, École Polytechnique, IP Paris, Rte de Saclay, Palaiseau, 91120, France.
Division of Artificial Intelligence in Medicine, Cedars-Sinai Medical Center, 116 N. Robertson Boulevard, Los Angeles, CA 90048, United States.
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad643.
The identification of cancer genes is a critical yet challenging problem in cancer genomics research. Existing computational methods, including deep graph neural networks, fail to exploit the multilayered gene-gene interactions or provide limited explanations for their predictions. These methods are restricted to a single biological network, which cannot capture the full complexity of tumorigenesis. Models trained on different biological networks often yield different and even opposite cancer gene predictions, hindering their trustworthy adaptation. Here, we introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes by leveraging multiple gene-gene interaction networks and pan-cancer multi-omics data. Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction.
Our method consistently outperforms all existing methods, with an average 7.15% improvement in area under the precision-recall curve over the current state-of-the-art method. Importantly, EMGNN integrated multiple graphs to prioritize newly predicted cancer genes with conflicting predictions from single biological networks. For each prediction, EMGNN provided valuable biological insights via both model-level feature importance explanations and molecular-level gene set enrichment analysis. Overall, EMGNN offers a powerful new paradigm of graph learning through modeling the multilayered topological gene relationships and provides a valuable tool for cancer genomics research.
Our code is publicly available at https://github.com/zhanglab-aim/EMGNN.
癌症基因的鉴定是癌症基因组学研究中的一个关键而具有挑战性的问题。现有的计算方法,包括深度图神经网络,无法利用多层次的基因-基因相互作用,或者为其预测提供有限的解释。这些方法仅限于单个生物网络,无法捕捉肿瘤发生的全部复杂性。在不同的生物网络上训练的模型通常会产生不同的,甚至相反的癌症基因预测,从而阻碍了它们的可信适应性。在这里,我们引入了一种可解释的多层图神经网络(EMGNN)方法,通过利用多个基因-基因相互作用网络和泛癌多组学数据来识别癌症基因。与在单个生物网络上进行传统图学习不同,EMGNN 使用多层图神经网络从多个生物网络中学习,以进行准确的癌症基因预测。
我们的方法始终优于所有现有的方法,在平均精度-召回曲线下面积方面比当前最先进的方法平均提高了 7.15%。重要的是,EMGNN 集成了多个图谱,以优先考虑与单个生物网络的预测相冲突的新预测的癌症基因。对于每个预测,EMGNN 通过模型级特征重要性解释和分子级基因集富集分析提供了有价值的生物学见解。总体而言,EMGNN 通过对多层次拓扑基因关系进行建模,提供了一种强大的新图学习范例,并为癌症基因组学研究提供了有价值的工具。
我们的代码可在 https://github.com/zhanglab-aim/EMGNN 上公开获取。