基于准确且可扩展的图嵌入的蛋白质-蛋白质相互作用大规模预测的高效计算模型。

An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding.

作者信息

Su Xiao-Rui, You Zhu-Hong, Hu Lun, Huang Yu-An, Wang Yi, Yi Hai-Cheng

机构信息

Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.

University of Chinese Academy of Sciences, Beijing, China.

出版信息

Front Genet. 2021 Feb 26;12:635451. doi: 10.3389/fgene.2021.635451. eCollection 2021.

DOI:10.3389/fgene.2021.635451

PMID:33719344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7953052/

Abstract

Protein-protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid these problems. Graph structure, as the important and pervasive data carriers, is considered as the most suitable structure to present biomedical entities and relationships. Although graph embedding is the most popular approach for graph representation learning, it usually suffers from high computational and space cost, especially in large-scale graphs. Therefore, developing a framework, which can accelerate graph embedding and improve the accuracy of embedding results, is important to large-scale PPIs prediction. In this paper, we propose a multi-level model LPPI to improve both the quality and speed of large-scale PPIs prediction. Firstly, protein basic information is collected as its attribute, including positional gene sets, motif gene sets, and immunological signatures. Secondly, we construct a weighted graph by using protein attributes to calculate node similarity. Then GraphZoom is used to accelerate the embedding process by reducing the size of the weighted graph. Next, graph embedding methods are used to learn graph topology features from the reconstructed graph. Finally, the linear Logistic Regression (LR) model is used to predict the probability of interactions of two proteins. LPPI achieved a high accuracy of 0.99997 and 0.9979 on the PPI network dataset and GraphSAGE-PPI dataset, respectively. Our further results show that the LPPI is promising for large-scale PPI prediction in both accuracy and efficiency, which is beneficial to other large-scale biomedical molecules interactions detection.

摘要

蛋白质-蛋白质相互作用（PPI）是活细胞整个分子机制的基础。尽管传统实验能够准确检测PPI，但它们往往成本高昂且耗时较长。因此，人们采用计算方法来预测PPI以避免这些问题。图结构作为重要且普遍存在的数据载体，被认为是呈现生物医学实体及其关系的最合适结构。虽然图嵌入是图表示学习中最流行的方法，但它通常存在高计算成本和高空间成本的问题，尤其是在大规模图中。因此，开发一个能够加速图嵌入并提高嵌入结果准确性的框架对于大规模PPI预测至关重要。在本文中，我们提出了一种多层次模型LPPI，以提高大规模PPI预测的质量和速度。首先，收集蛋白质的基本信息作为其属性，包括位置基因集、基序基因集和免疫特征。其次，我们利用蛋白质属性构建加权图以计算节点相似度。然后使用GraphZoom通过减小加权图的大小来加速嵌入过程。接下来，使用图嵌入方法从重建后的图中学习图拓扑特征。最后，使用线性逻辑回归（LR）模型预测两种蛋白质相互作用的概率。LPPI在PPI网络数据集和GraphSAGE-PPI数据集上分别达到了0.99997和0.9979的高精度。我们的进一步结果表明，LPPI在大规模PPI预测的准确性和效率方面都很有前景，这对其他大规模生物医学分子相互作用检测有益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb68/7953052/9b9bb53e04d5/fgene-12-635451-g001.jpg

相似文献

An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding.基于准确且可扩展的图嵌入的蛋白质-蛋白质相互作用大规模预测的高效计算模型。

Front Genet. 2021 Feb 26;12:635451. doi: 10.3389/fgene.2021.635451. eCollection 2021.

Graph embedding-based novel protein interaction prediction via higher-order graph convolutional network.基于图嵌入的高阶图卷积网络新型蛋白质相互作用预测。

PLoS One. 2020 Sep 24;15(9):e0238915. doi: 10.1371/journal.pone.0238915. eCollection 2020.

Graph-based prediction of Protein-protein interactions with attributed signed graph embedding.基于属性有向图嵌入的蛋白质-蛋白质相互作用的图预测。

BMC Bioinformatics. 2020 Jul 21;21(1):323. doi: 10.1186/s12859-020-03646-8.

DSSGNN-PPI: A Protein-Protein Interactions prediction model based on Double Structure and Sequence graph neural networks.DSSGNN-PPI：一种基于双结构和序列图神经网络的蛋白质-蛋白质相互作用预测模型。

Comput Biol Med. 2024 Jul;177:108669. doi: 10.1016/j.compbiomed.2024.108669. Epub 2024 May 29.

Graph generative and adversarial strategy-enhanced node feature learning and self-calibrated pairwise attribute encoding for prediction of drug-related side effects.用于预测药物相关副作用的图生成与对抗策略增强的节点特征学习及自校准成对属性编码

Front Pharmacol. 2023 Sep 4;14:1257842. doi: 10.3389/fphar.2023.1257842. eCollection 2023.

A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding.基于属性图嵌入的蛋白质-蛋白质相互作用网络新链接预测算法。

Comput Biol Med. 2021 Oct;137:104772. doi: 10.1016/j.compbiomed.2021.104772. Epub 2021 Aug 18.

Proximity-Based Compression for Network Embedding.基于邻近度的网络嵌入压缩

Front Big Data. 2021 Jan 26;3:608043. doi: 10.3389/fdata.2020.608043. eCollection 2020.

ALDPI: adaptively learning importance of multi-scale topologies and multi-modality similarities for drug-protein interaction prediction.ALDPI：用于药物-蛋白相互作用预测的自适应学习多尺度拓扑和多模态相似性的重要性。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab606.

Drug-Drug Interaction Predictions via Knowledge Graph and Text Embedding: Instrument Validation Study.通过知识图谱和文本嵌入进行药物-药物相互作用预测：工具验证研究

JMIR Med Inform. 2021 Jun 24;9(6):e28277. doi: 10.2196/28277.

Computational probing protein-protein interactions targeting small molecules.针对小分子的蛋白质-蛋白质相互作用的计算探测

Bioinformatics. 2016 Jan 15;32(2):226-34. doi: 10.1093/bioinformatics/btv528. Epub 2015 Sep 28.

引用本文的文献

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景：任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction.用于改进天然-非天然蛋白质-蛋白质相互作用预测的集成分类器。

Int J Mol Sci. 2024 May 29;25(11):5957. doi: 10.3390/ijms25115957.

A multi-source molecular network representation model for protein-protein interactions prediction.一种用于蛋白质相互作用预测的多源分子网络表示模型。

Sci Rep. 2024 Mar 14;14(1):6184. doi: 10.1038/s41598-024-56286-w.

Graph embedding on mass spectrometry- and sequencing-based biomedical data.基于质谱和测序的生物医学数据的图嵌入。

BMC Bioinformatics. 2024 Jan 2;25(1):1. doi: 10.1186/s12859-023-05612-6.

Multi-view heterogeneous molecular network representation learning for protein-protein interaction prediction.多视角异质分子网络表示学习在蛋白质相互作用预测中的应用。

BMC Bioinformatics. 2022 Jun 16;23(1):234. doi: 10.1186/s12859-022-04766-z.

本文引用的文献

Graph embedding on biomedical networks: methods, applications and evaluations.生物医学网络上的图嵌入：方法、应用和评估。

Bioinformatics. 2020 Feb 15;36(4):1241-1251. doi: 10.1093/bioinformatics/btz718.

Contextual Correlation Preserving Multiview Featured Graph Clustering.上下文相关保持的多视图特征图聚类。

IEEE Trans Cybern. 2020 Oct;50(10):4318-4331. doi: 10.1109/TCYB.2019.2926431. Epub 2019 Jul 19.

Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme.使用混合特征表示和堆叠泛化方案进行蛋白质-蛋白质相互作用预测。

BMC Bioinformatics. 2019 Jun 10;20(1):308. doi: 10.1186/s12859-019-2907-1.

PPI-Detect: A support vector machine model for sequence-based prediction of protein-protein interactions.PPI-Detect：一种基于序列的蛋白质-蛋白质相互作用预测的支持向量机模型。

J Comput Chem. 2019 Apr 30;40(11):1233-1242. doi: 10.1002/jcc.25780. Epub 2019 Feb 15.

Network embedding in biomedical data science.生物医学数据科学中的网络嵌入

Brief Bioinform. 2020 Jan 17;21(1):182-197. doi: 10.1093/bib/bby117.

Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data.通过整合 PPI 网络、临床 RNA-Seq 数据和 OMIM 数据进行疾病基因预测。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):222-232. doi: 10.1109/TCBB.2017.2770120. Epub 2017 Nov 7.

Detection of Interactions between Proteins by Using Legendre Moments Descriptor to Extract Discriminatory Information Embedded in PSSM.利用勒让德矩描述符提取PSSM中嵌入的鉴别信息来检测蛋白质之间的相互作用

Molecules. 2017 Aug 18;22(8):1366. doi: 10.3390/molecules22081366.

Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network.通过堆叠式稀疏自动编码器深度神经网络从蛋白质序列预测蛋白质-蛋白质相互作用。

Mol Biosyst. 2017 Jun 27;13(7):1336-1344. doi: 10.1039/c7mb00188f.

PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.PCVMZM：使用概率分类向量机模型结合泽尼克矩描述符从蛋白质序列预测蛋白质-蛋白质相互作用

Int J Mol Sci. 2017 May 11;18(5):1029. doi: 10.3390/ijms18051029.

node2vec: Scalable Feature Learning for Networks.节点2向量：网络的可扩展特征学习

KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于准确且可扩展的图嵌入的蛋白质-蛋白质相互作用大规模预测的高效计算模型。

An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献