一种用于准确预测蛋白质-蛋白质相互作用的端到端知识图谱融合图神经网络

An End-to-End Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction.

作者信息

Yang Jie, Li Yapeng, Wang Guoyin, Chen Zhong, Wu Di

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2518-2530. doi: 10.1109/TCBB.2024.3486216. Epub 2024 Dec 10.

Abstract

Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) Protein Associated Network (PAN) Construction: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) Graph Neural Network for Feature Extraction: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) Multi-layer Perceptron for Feature Fusion: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.

摘要

蛋白质-蛋白质相互作用(PPIs)对于理解细胞机制、信号网络、疾病过程和药物开发至关重要,因为它们代表了蛋白质之间的物理接触和功能关联。最近的进展见证了旨在预测PPIs的人工智能(AI)方法所取得的成果。然而,这些方法通常以碎片化或表面化的方式处理蛋白质、药物、疾病、核糖核酸(RNA)和蛋白质结构之间复杂的关系网络和机制。这通常是由于非端到端学习框架的局限性,这可能导致次优的特征提取和融合,从而影响预测准确性。为了解决这些不足,本文引入了一种新颖的端到端学习模型,即知识图谱融合图神经网络(KGF-GNN)。该模型包括三个不可或缺的组件:(1)蛋白质关联网络(PAN)构建:我们首先构建一个PAN,它广泛捕捉将蛋白质与药物、疾病、RNA和蛋白质结构联系起来的各种关系和机制。(2)用于特征提取的图神经网络:然后使用图神经网络(GNN)从PAN中提取拓扑和语义特征,同时另一个GNN旨在直接从观察到的PPI网络中提取拓扑特征。(3)用于特征融合的多层感知器:最后,多层感知器通过端到端学习整合这些不同的特征,确保特征提取和融合过程对于PPI预测都是全面且优化的。在真实世界的PPI数据集上进行的大量实验验证了我们提出的KGF-GNN方法的有效性,该方法不仅在预测PPIs方面取得了高精度,而且显著超越了现有的最先进模型。这项工作不仅提高了我们以更高精度预测PPIs的能力,还为AI在生物信息学中的更广泛应用做出了贡献,对生物学研究和治疗开发具有深远意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索