Suppr超能文献

通过基于Transformer的图表示学习在生物网络中对癌症基因进行可解释识别。

Interpretable identification of cancer genes across biological networks via transformer-powered graph representation learning.

作者信息

Su Xiaorui, Hu Pengwei, Li Dongxu, Zhao Bowei, Niu Zhaomeng, Herget Thomas, Yu Philip S, Hu Lun

机构信息

Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.

University of Chinese Academy of Sciences, Beijing, China.

出版信息

Nat Biomed Eng. 2025 Mar;9(3):371-389. doi: 10.1038/s41551-024-01312-5. Epub 2025 Jan 9.

Abstract

Graph representation learning has been leveraged to identify cancer genes from biological networks. However, its applicability is limited by insufficient interpretability and generalizability under integrative network analysis. Here we report the development of an interpretable and generalizable transformer-based model that accurately predicts cancer genes by leveraging graph representation learning and the integration of multi-omics data with the topologies of homogeneous and heterogeneous networks of biological interactions. The model allows for the interpretation of the respective importance of multi-omic and higher-order structural features, achieved state-of-the-art performance in the prediction of cancer genes across biological networks (including networks of interactions between miRNA and proteins, transcription factors and proteins, and transcription factors and miRNA) in pan-cancer and cancer-specific scenarios, and predicted 57 cancer-gene candidates (including three genes that had not been identified by other models) among 4,729 unlabelled genes across 8 pan-cancer datasets. The model's interpretability and generalization may facilitate the understanding of gene-related regulatory mechanisms and the discovery of new cancer genes.

摘要

图表示学习已被用于从生物网络中识别癌症基因。然而,在整合网络分析中,其适用性受到可解释性和通用性不足的限制。在此,我们报告了一种基于可解释且通用的Transformer模型的开发,该模型通过利用图表示学习以及多组学数据与生物相互作用的同构和异构网络拓扑结构的整合,准确预测癌症基因。该模型能够解释多组学和高阶结构特征的各自重要性,在泛癌和癌症特异性场景下跨生物网络(包括miRNA与蛋白质、转录因子与蛋白质以及转录因子与miRNA之间的相互作用网络)预测癌症基因方面达到了先进水平,并在8个泛癌数据集中的4729个未标记基因中预测出57个癌症基因候选物(包括其他模型未识别的3个基因)。该模型的可解释性和通用性可能有助于理解基因相关调控机制并发现新的癌症基因。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验