Institute for Advanced Biosciences, Keio University, 403-1, Daihoji, Tsuruoka, Yamagata 997-0017, Japan.
BMC Bioinformatics. 2010 Jun 28;11:350. doi: 10.1186/1471-2105-11-350.
High-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions.
Here, we introduce a combinatorial approach for prediction of protein complexes focusing not only on determining member proteins in complexes but also on the DDI/PPI organization of the complexes. Our method analyzes complex candidates predicted by the existing methods. It searches for optimal combinations of domain-domain interactions in the candidates based on an assumption that the proteins in a candidate can form a true protein complex if each of the domains is used by a single protein interaction. This optimization problem was mathematically formulated and solved using binary integer linear programming. By using publicly available sets of yeast protein-protein interactions and domain-domain interactions, we succeeded in extracting protein complex candidates with an accuracy that is twice the average accuracy of the existing methods, MCL, MCODE, or clustering coefficient. Although the configuring parameters for each algorithm resulted in slightly improved precisions, our method always showed better precision for most values of the parameters.
Our combinatorial approach can provide better accuracy for prediction of protein complexes and also enables to identify both direct PPIs and DDIs that mediate them in complexes.
高通量方法检测蛋白质-蛋白质相互作用使我们能够获得大量的相互作用网络,并且还能够通过计算识别蛋白质的关联作为蛋白质复合物。虽然有从相互作用网络中提取蛋白质复合物作为蛋白质集的方法,但是提取的复合物可能包含假阳性,因为它们没有考虑到蛋白质的结构限制,因此不检查提取复合物中的蛋白质是否可以同时相互结合。此外,很少有深入研究蛋白质复合物的方法,例如蛋白质-蛋白质相互作用的拓扑结构或介导蛋白质相互作用的结构域-结构域相互作用。
在这里,我们引入了一种组合方法来预测蛋白质复合物,不仅关注确定复合物中的成员蛋白,还关注复合物的 DDI/PPI 组织。我们的方法分析了现有方法预测的复合物候选物。它根据一个假设搜索候选物中最优的结构域-结构域相互作用组合,即如果候选物中的每个蛋白质都使用单个蛋白质相互作用,那么这些蛋白质可以形成真正的蛋白质复合物。这个优化问题被数学地形式化并使用二进制整数线性规划来解决。通过使用公开的酵母蛋白质-蛋白质相互作用和结构域-结构域相互作用集,我们成功地提取了复合物候选物,其准确性是现有方法(MCL、MCODE 或聚类系数)的平均准确性的两倍。尽管每个算法的配置参数导致了略微提高的精度,但对于大多数参数值,我们的方法始终显示出更好的精度。
我们的组合方法可以为蛋白质复合物的预测提供更好的准确性,并且还能够识别直接的蛋白质-蛋白质相互作用和介导它们的结构域-结构域相互作用。