Suppr超能文献

确定支持已知蛋白质复合物所需的蛋白质-蛋白质相互作用的最小数量。

Determining the minimum number of protein-protein interactions required to support known protein complexes.

机构信息

Institute of Molecular and Cellular Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.

Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, 14-4, Nishiikumacho, Matsue, Shimane 690-8518, Japan.

出版信息

PLoS One. 2018 Apr 26;13(4):e0195545. doi: 10.1371/journal.pone.0195545. eCollection 2018.

Abstract

The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.

摘要

从蛋白质-蛋白质相互作用(PPIs)预测蛋白质复合物是生物信息学中一个研究得很好的问题。然而,目前可用的 PPI 数据不足以描述所有已知的蛋白质复合物。在本文中,我们将确定所需(额外)蛋白质-蛋白质相互作用的最小数量的问题表示为一个图论问题,其约束条件是每个复合物在 PPI 网络中构成一个连通分量。对于这个问题,我们开发了两种计算方法:一种基于整数线性规划(ILPMinPPI),另一种基于最初在通信和社交网络背景下开发的现有贪婪型近似算法(GreedyMinPPI)。由于前一种方法仅适用于小数据集,因此我们将后一种方法应用于 CYC2008 蛋白质复合物数据集与八个 PPI 数据集(STRING、MINT、BioGRID、IntAct、DIP、BIND、WI-PHI、iRefIndex)的组合。结果表明,所需额外的 PPIs 的最小数量范围从 51(STRING)到 964(BIND),即使是四个最好的 PPI 数据库 STRING(51)、BioGRID(67)、WI-PHI(93)和 iRefIndex(85),也没有包含足够的 PPIs 来形成所有 CYC2008 蛋白质复合物。我们还证明了所提出的问题框架和我们的解决方案可以提高现有 PPI 预测方法的预测准确性。ILPMinPPI 可从 http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/ 免费下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验