Suppr超能文献

基于网络表示学习和基因本体知识的监督蛋白质复合物预测方法。

A supervised protein complex prediction method with network representation learning and gene ontology knowledge.

机构信息

School of Information Science and Technology, Dalian Maritime University, Dalian, 116024, Liaoning, China.

Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.

出版信息

BMC Bioinformatics. 2022 Jul 25;23(1):300. doi: 10.1186/s12859-022-04850-4.

Abstract

BACKGROUND

Protein complexes are essential for biologists to understand cell organization and function effectively. In recent years, predicting complexes from protein-protein interaction (PPI) networks through computational methods is one of the current research hotspots. Many methods for protein complex prediction have been proposed. However, how to use the information of known protein complexes is still a fundamental problem that needs to be solved urgently in predicting protein complexes.

RESULTS

To solve these problems, we propose a supervised learning method based on network representation learning and gene ontology knowledge, which can fully use the information of known protein complexes to predict new protein complexes. This method first constructs a weighted PPI network based on gene ontology knowledge and topology information, reducing the network's noise problem. On this basis, the topological information of known protein complexes is extracted as features, and the supervised learning model SVCC is obtained according to the feature training. At the same time, the SVCC model is used to predict candidate protein complexes from the protein interaction network. Then, we use the network representation learning method to obtain the vector representation of the protein complex and train the random forest model. Finally, we use the random forest model to classify the candidate protein complexes to obtain the final predicted protein complexes. We evaluate the performance of the proposed method on two publicly PPI data sets.

CONCLUSIONS

Experimental results show that our method can effectively improve the performance of protein complex recognition compared with existing methods. In addition, we also analyze the biological significance of protein complexes predicted by our method and other methods. The results show that the protein complexes predicted by our method have high biological significance.

摘要

背景

蛋白质复合物对于生物学家有效理解细胞组织和功能至关重要。近年来,通过计算方法从蛋白质-蛋白质相互作用(PPI)网络中预测复合物是当前研究的热点之一。已经提出了许多蛋白质复合物预测方法。然而,如何利用已知蛋白质复合物的信息仍然是预测蛋白质复合物时需要迫切解决的基本问题。

结果

为了解决这些问题,我们提出了一种基于网络表示学习和基因本体知识的有监督学习方法,该方法可以充分利用已知蛋白质复合物的信息来预测新的蛋白质复合物。该方法首先基于基因本体知识和拓扑信息构建加权 PPI 网络,减少网络的噪声问题。在此基础上,提取已知蛋白质复合物的拓扑信息作为特征,并根据特征训练得到有监督学习模型 SVCC。同时,SVCC 模型用于从蛋白质相互作用网络中预测候选蛋白质复合物。然后,我们使用网络表示学习方法获取蛋白质复合物的向量表示,并训练随机森林模型。最后,我们使用随机森林模型对候选蛋白质复合物进行分类,得到最终预测的蛋白质复合物。我们在两个公开的 PPI 数据集上评估了所提出方法的性能。

结论

实验结果表明,与现有方法相比,我们的方法可以有效地提高蛋白质复合物识别的性能。此外,我们还分析了我们的方法和其他方法预测的蛋白质复合物的生物学意义。结果表明,我们的方法预测的蛋白质复合物具有较高的生物学意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8145/9317086/193283ead836/12859_2022_4850_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验