基于网络表示学习和基因本体知识的监督蛋白质复合物预测方法。

A supervised protein complex prediction method with network representation learning and gene ontology knowledge.

机构信息

School of Information Science and Technology, Dalian Maritime University, Dalian, 116024, Liaoning, China.

Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.

出版信息

BMC Bioinformatics. 2022 Jul 25;23(1):300. doi: 10.1186/s12859-022-04850-4.

DOI:10.1186/s12859-022-04850-4

PMID:35879648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9317086/

Abstract

BACKGROUND

Protein complexes are essential for biologists to understand cell organization and function effectively. In recent years, predicting complexes from protein-protein interaction (PPI) networks through computational methods is one of the current research hotspots. Many methods for protein complex prediction have been proposed. However, how to use the information of known protein complexes is still a fundamental problem that needs to be solved urgently in predicting protein complexes.

RESULTS

To solve these problems, we propose a supervised learning method based on network representation learning and gene ontology knowledge, which can fully use the information of known protein complexes to predict new protein complexes. This method first constructs a weighted PPI network based on gene ontology knowledge and topology information, reducing the network's noise problem. On this basis, the topological information of known protein complexes is extracted as features, and the supervised learning model SVCC is obtained according to the feature training. At the same time, the SVCC model is used to predict candidate protein complexes from the protein interaction network. Then, we use the network representation learning method to obtain the vector representation of the protein complex and train the random forest model. Finally, we use the random forest model to classify the candidate protein complexes to obtain the final predicted protein complexes. We evaluate the performance of the proposed method on two publicly PPI data sets.

CONCLUSIONS

Experimental results show that our method can effectively improve the performance of protein complex recognition compared with existing methods. In addition, we also analyze the biological significance of protein complexes predicted by our method and other methods. The results show that the protein complexes predicted by our method have high biological significance.

摘要

背景

蛋白质复合物对于生物学家有效理解细胞组织和功能至关重要。近年来，通过计算方法从蛋白质-蛋白质相互作用（PPI）网络中预测复合物是当前研究的热点之一。已经提出了许多蛋白质复合物预测方法。然而，如何利用已知蛋白质复合物的信息仍然是预测蛋白质复合物时需要迫切解决的基本问题。

结果

为了解决这些问题，我们提出了一种基于网络表示学习和基因本体知识的有监督学习方法，该方法可以充分利用已知蛋白质复合物的信息来预测新的蛋白质复合物。该方法首先基于基因本体知识和拓扑信息构建加权 PPI 网络，减少网络的噪声问题。在此基础上，提取已知蛋白质复合物的拓扑信息作为特征，并根据特征训练得到有监督学习模型 SVCC。同时，SVCC 模型用于从蛋白质相互作用网络中预测候选蛋白质复合物。然后，我们使用网络表示学习方法获取蛋白质复合物的向量表示，并训练随机森林模型。最后，我们使用随机森林模型对候选蛋白质复合物进行分类，得到最终预测的蛋白质复合物。我们在两个公开的 PPI 数据集上评估了所提出方法的性能。

结论

实验结果表明，与现有方法相比，我们的方法可以有效地提高蛋白质复合物识别的性能。此外，我们还分析了我们的方法和其他方法预测的蛋白质复合物的生物学意义。结果表明，我们的方法预测的蛋白质复合物具有较高的生物学意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8145/9317086/193283ead836/12859_2022_4850_Fig1_HTML.jpg

相似文献

A supervised protein complex prediction method with network representation learning and gene ontology knowledge.基于网络表示学习和基因本体知识的监督蛋白质复合物预测方法。

BMC Bioinformatics. 2022 Jul 25;23(1):300. doi: 10.1186/s12859-022-04850-4.

Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks.基于从蛋白质-蛋白质相互作用网络中获得的节点嵌入来识别蛋白质复合物。

BMC Bioinformatics. 2018 Sep 21;19(1):332. doi: 10.1186/s12859-018-2364-2.

Identification of protein complexes from multi-relationship protein interaction networks.从多重关系蛋白质相互作用网络中识别蛋白质复合物。

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):17. doi: 10.1186/s40246-016-0069-z.

Molecular complex detection in protein interaction networks through reinforcement learning.通过强化学习在蛋白质相互作用网络中检测分子复合物。

BMC Bioinformatics. 2023 Aug 2;24(1):306. doi: 10.1186/s12859-023-05425-7.

Protein complex prediction in large ontology attributed protein-protein interaction networks.大型本体属性蛋白质 - 蛋白质相互作用网络中的蛋白质复合物预测

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):729-41. doi: 10.1109/TCBB.2013.86.

Protein Complexes Detection Based on Semi-Supervised Network Embedding Model.基于半监督网络嵌入模型的蛋白质复合物检测。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):797-803. doi: 10.1109/TCBB.2019.2944809. Epub 2021 Apr 8.

Predicting protein complex in protein interaction network - a supervised learning based method.蛋白质相互作用网络中蛋白质复合物的预测——一种基于监督学习的方法。

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S4. doi: 10.1186/1752-0509-8-S3-S4. Epub 2014 Oct 22.

A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks.一种用于从多个异构网络中检测蛋白质复合物的多网络聚类方法。

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):463. doi: 10.1186/s12859-017-1877-4.

From communities to protein complexes: A local community detection algorithm on PPI networks.从社区到蛋白质复合物：PPI 网络上的局部社区检测算法。

PLoS One. 2022 Jan 27;17(1):e0260484. doi: 10.1371/journal.pone.0260484. eCollection 2022.

Temporal Protein Complex Identification Based on Dynamic Heterogeneous Protein Information Network Representation Learning.基于动态异构蛋白质信息网络表示学习的时间蛋白质复合物识别

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1154-1164. doi: 10.1109/TCBB.2024.3351078. Epub 2024 Oct 9.

引用本文的文献

Integration of protein sequence and protein-protein interaction data by hypergraph learning to identify novel protein complexes.通过超图学习整合蛋白质序列和蛋白质-蛋白质相互作用数据，以识别新的蛋白质复合物。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae274.

DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning.DL-PPI：一种基于深度学习的预测序列蛋白质相互作用的方法。

BMC Bioinformatics. 2023 Dec 14;24(1):473. doi: 10.1186/s12859-023-05594-5.

Construction of diagnostic and prognostic models based on gene signatures of nasopharyngeal carcinoma by machine learning methods.基于机器学习方法的鼻咽癌基因特征构建诊断和预后模型。

Transl Cancer Res. 2023 May 31;12(5):1254-1269. doi: 10.21037/tcr-22-2700. Epub 2023 Apr 10.

本文引用的文献

Combining SVM and ECOC for Identification of Protein Complexes from Protein Protein Interaction Networks by Integrating Amino Acids' Physical Properties and Complex Topology.结合 SVM 和 ECOC 从蛋白质相互作用网络中鉴定蛋白质复合物，整合氨基酸的物理性质和复合物拓扑结构。

Interdiscip Sci. 2020 Sep;12(3):264-275. doi: 10.1007/s12539-020-00369-5. Epub 2020 May 21.

Protein Complexes Detection Based on Semi-Supervised Network Embedding Model.基于半监督网络嵌入模型的蛋白质复合物检测。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):797-803. doi: 10.1109/TCBB.2019.2944809. Epub 2021 Apr 8.

Identifying protein complexes based on an edge weight algorithm and core-attachment structure.基于边权重算法和核心附着结构识别蛋白质复合物。

BMC Bioinformatics. 2019 Sep 14;20(1):471. doi: 10.1186/s12859-019-3007-y.

A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and GO annotations.基于拓扑结构和 GO 注释的密度和模块性的种子扩展算法，用于检测蛋白质复合物。

BMC Genomics. 2019 Aug 7;20(1):637. doi: 10.1186/s12864-019-5956-y.

Protein complexes identification based on go attributed network embedding.基于 GO 属性网络嵌入的蛋白质复合物识别。

BMC Bioinformatics. 2018 Dec 20;19(1):535. doi: 10.1186/s12859-018-2555-x.

CORUM: the comprehensive resource of mammalian protein complexes-2019.CORUM：哺乳动物蛋白质复合物综合资源-2019 年版。

Nucleic Acids Res. 2019 Jan 8;47(D1):D559-D563. doi: 10.1093/nar/gky973.

An effective approach to detecting both small and large complexes from protein-protein interaction networks.一种从蛋白质-蛋白质相互作用网络中检测大小复合物的有效方法。

BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):419. doi: 10.1186/s12859-017-1820-8.

node2vec: Scalable Feature Learning for Networks.节点2向量：网络的可扩展特征学习

KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.

Protein-protein interaction databases.蛋白质-蛋白质相互作用数据库。

Methods Mol Biol. 2015;1278:39-56. doi: 10.1007/978-1-4939-2425-7_3.

The COP9 signalosome is vital for timely repair of DNA double-strand breaks.COP9信号体对于DNA双链断裂的及时修复至关重要。

Nucleic Acids Res. 2015 May 19;43(9):4517-30. doi: 10.1093/nar/gkv270. Epub 2015 Apr 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于网络表示学习和基因本体知识的监督蛋白质复合物预测方法。

A supervised protein complex prediction method with network representation learning and gene ontology knowledge.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献