College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
School of Artificial Intelligence, Hubei University, Wuhan 430070, China.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae418.
Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on graph neural networks (GNN) are effective in identifying cancer genes, they fall short in effectively integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multiview data, offering significant insights into the identification of cancer genes. Experimental results demonstrate that IMVRL-GCN outperforms state-of-the-art cancer gene identification methods and several baselines. Furthermore, IMVRL-GCN is employed to identify a total of 74 high-confidence novel cancer genes, and multiview data analysis highlights the pivotal roles of shared, mutation-specific, and structure-specific representations in discriminating distinctive cancer genes. Exploration of the mechanisms behind their discriminative capabilities suggests that shared representations are strongly associated with gene functions, while mutation-specific and structure-specific representations are linked to mutagenic propensity and functional synergy, respectively. Finally, our in-depth analyses of these candidates suggest potential insights for individualized treatments: afatinib could counteract many mutation-driven risks, and targeting interactions with cancer gene SRC is a reasonable strategy to mitigate interaction-induced risks for NR3C1, RXRA, HNF4A, and SP1.
肿瘤发生源于癌症基因的功能障碍,通过多种机制导致细胞失控增殖。建立完整的癌症基因目录将使精准肿瘤学成为可能。虽然基于图神经网络(GNN)的现有方法在识别癌症基因方面非常有效,但它们在有效整合来自多个视图的数据和解释预测结果方面存在不足。为了解决这些缺点,提出了一种可解释的表示学习框架 IMVRL-GCN,用于从多视图数据中捕获共享和特定表示,为识别癌症基因提供了重要的见解。实验结果表明,IMVRL-GCN 优于最先进的癌症基因识别方法和几个基线。此外,IMVRL-GCN 用于识别总共 74 个高置信度的新癌症基因,多视图数据分析突出了共享表示、突变特异性表示和结构特异性表示在区分独特癌症基因方面的关键作用。对其判别能力背后机制的探索表明,共享表示与基因功能密切相关,而突变特异性表示和结构特异性表示分别与诱变倾向和功能协同作用相关。最后,我们对这些候选者的深入分析表明,个体化治疗可能有潜在的启示:阿法替尼可能对抗许多突变驱动的风险,靶向与癌症基因 SRC 的相互作用是减轻 NR3C1、RXRA、HNF4A 和 SP1 相互作用诱导风险的合理策略。