Suppr超能文献

通过密集子图和误报分析进行蛋白质复合物预测

Protein complex prediction via dense subgraphs and false positive analysis.

作者信息

Hernandez Cecilia, Mella Carlos, Navarro Gonzalo, Olivera-Nappa Alvaro, Araya Jaime

机构信息

Computer Science, University of Concepción, Concepción, Chile.

Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile.

出版信息

PLoS One. 2017 Sep 22;12(9):e0183460. doi: 10.1371/journal.pone.0183460. eCollection 2017.

Abstract

Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.

摘要

许多蛋白质会与其他蛋白质以称为复合物的组合形式协同工作,以实现特定功能。发现蛋白质复合物对于理解生物过程以及预测生物体中的蛋白质功能至关重要。大规模和高通量技术使得编译蛋白质-蛋白质相互作用网络(PPI网络)成为可能,这些网络已被用于多种检测蛋白质复合物的计算方法中。这些预测可能会指导未来的生物学实验研究。一些方法基于拓扑结构,其中高度连接的蛋白质被预测为复合物;一些方法提出了不同的聚类算法,使用划分、聚类之间的重叠来处理未加权或加权图建模的网络;还有一些方法使用聚类密度和基于蛋白质功能的信息。然而,一些方案仍然需要大量处理时间,或者其结果质量有待提高。此外,大多数通过计算工具获得的结果都没有伴随对假阳性的分析。我们提出了一种有效且高效的挖掘算法来发现高度连接的子图,这是我们定义蛋白质复合物的基础。我们的表示方法基于将PPI网络转换为有向无环图,这减少了表示的边数以及发现子图的搜索空间。我们的方法考虑了加权和未加权的PPI网络。我们将使用来自酿酒酵母(酵母)和智人(人类)的PPI网络的最佳替代方案与最先进的方法在聚类、生物学指标和执行时间方面进行比较,同时还与酵母的三个金标准和人类的两个金标准进行比较。此外,我们通过搜索欧洲蛋白质数据库(PDBe)来分析预测的假阳性复合物,以识别已被纯化并进行结构表征的匹配蛋白质复合物。我们的分析表明,根据我们的预测方法,超过50个酵母蛋白质复合物和超过300个人类蛋白质复合物被发现为假阳性,即在金标准复合物数据库中未被描述,但实际上包含已在PDBe中进行结构表征和记录的蛋白质复合物。我们还发现,其中一些蛋白质复合物最近被归类为蛋白质复合物周期表的一部分。我们软件的最新版本可在http://doi.org/10.6084/m9.figshare.5297314.v1上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9308/5609739/19741ba62e35/pone.0183460.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验