Suppr超能文献

有效性与效率:用于蛋白质-蛋白质相互作用的标签感知层次子图学习

Effectiveness and Efficiency: Label-Aware Hierarchical Subgraph Learning for Protein-Protein Interaction.

作者信息

Zhou Yuanqing, Lin Haitao, Xie Lianghua, Huang Yufei, Wu Lirong, Li Stan Z, Chen Wei

机构信息

Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China; AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China.

AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China.

出版信息

J Mol Biol. 2025 Mar 15;437(6):168737. doi: 10.1016/j.jmb.2024.168737. Epub 2024 Aug 3.

Abstract

The study of protein-protein interactions (PPIs) holds immense significance in understanding various biological activities, as well as in drug discovery and disease diagnosis. Existing deep learning methods for PPI prediction, including graph neural networks (GNNs), have been widely employed as the solutions, while they often experience a decline in performance in the real world. We claim that the topological shortcut is one of the key problems contributing negatively to the performance, according to our analysis. By modeling the PPIs as a graph with protein as nodes and interactions as edge types, the prevailing models tend to learn the pattern of nodes' degrees rather than intrinsic sequence-structure profiles, leading to the problem termed topological shortcut. The huge data growth of PPI leads to intensive computational costs and challenges computing devices, causing infeasibility in practice. To address the discussed problems, we propose a label-aware hierarchical subgraph learning method (laruGL-PPI) that can effectively infer PPIs while being interpretable. Specifically, we introduced edge-based subgraph sampling to effectively alleviate the problems of topological shortcuts and high computing costs. Besides, the inner-outer connections of PPIs are modeled as a hierarchical graph, together with the dependencies between interaction types constructed by a label graph. Extensive experiments conducted across various scales of PPI datasets have conclusively demonstrated that the laruGL-PPI method surpasses the most advanced PPI prediction techniques currently available, particularly in the testing of unseen proteins. Also, our model can recognize crucial sites of proteins, such as surface sites for binding and active sites for catalysis.

摘要

蛋白质-蛋白质相互作用(PPI)的研究对于理解各种生物活动以及药物发现和疾病诊断具有极其重要的意义。现有的用于PPI预测的深度学习方法,包括图神经网络(GNN),已被广泛用作解决方案,然而它们在实际应用中性能往往会下降。根据我们的分析,我们认为拓扑捷径是对性能产生负面影响的关键问题之一。通过将PPI建模为以蛋白质为节点、相互作用为边类型的图,主流模型倾向于学习节点度的模式而不是内在的序列-结构特征,从而导致了所谓的拓扑捷径问题。PPI数据的巨大增长导致计算成本高昂,并对计算设备构成挑战,在实际应用中不可行。为了解决上述问题,我们提出了一种标签感知分层子图学习方法(laruGL-PPI),该方法能够在可解释的同时有效地推断PPI。具体来说,我们引入了基于边的子图采样,以有效缓解拓扑捷径和高计算成本的问题。此外,PPI的内外连接被建模为分层图,同时通过标签图构建相互作用类型之间的依赖关系。在各种规模的PPI数据集上进行的大量实验最终表明,laruGL-PPI方法超越了目前可用的最先进的PPI预测技术,特别是在对未知蛋白质的测试中。此外,我们的模型可以识别蛋白质的关键位点,如结合的表面位点和催化的活性位点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验