Wu Min, Li Xiao-Li, Kwoh Chee-Keong, Ng See-Kiong, Wong Limsoon
School of Computer Engineering, Nanyang Technological University, Singapore.
J Comput Biol. 2012 Sep;19(9):1027-42. doi: 10.1089/cmb.2010.0293. Epub 2011 Jul 21.
Many cellular functions involve protein complexes that are formed by multiple interacting proteins. Tandem Affinity Purification (TAP) is a popular experimental method for detecting such multi-protein interactions. However, current computational methods that predict protein complexes from TAP data require converting the co-complex relationships in TAP data into binary interactions. The resulting pairwise protein-protein interaction (PPI) network is then mined for densely connected regions that are identified as putative protein complexes. Converting the TAP data into PPI data not only introduces errors but also loses useful information about the underlying multi-protein relationships that can be exploited to detect the internal organization (i.e., core-attachment structures) of protein complexes. In this article, we propose a method called CACHET that detects protein complexes with Core-AttaCHment structures directly from bipartitETAP data. CACHET models the TAP data as a bipartite graph in which the two vertex sets are the baits and the preys, respectively. The edges between the two vertex sets represent bait-prey relationships. CACHET first focuses on detecting high-quality protein-complex cores from the bipartite graph. To minimize the effects of false positive interactions, the bait-prey relationships are indexed with reliability scores. Only non-redundant, reliable bicliques computed from the TAP bipartite graph are regarded as protein-complex cores. CACHET constructs protein complexes by including attachment proteins into the cores. We applied CACHET on large-scale TAP datasets and found that CACHET outperformed existing methods in terms of prediction accuracy (i.e., F-measure and functional homogeneity of predicted complexes). In addition, the protein complexes predicted by CACHET are equipped with core-attachment structures that provide useful biological insights into the inherent functional organization of protein complexes. Our supplementary material can be found at http://www1.i2r.a-star.edu.sg/~xlli/CACHET/CACHET.htm ; binary executables can also be found there. Supplementary Material is also available at www.liebertonline.com/cmb.
许多细胞功能涉及由多种相互作用蛋白质形成的蛋白质复合物。串联亲和纯化(TAP)是一种用于检测此类多蛋白相互作用的常用实验方法。然而,目前从TAP数据预测蛋白质复合物的计算方法需要将TAP数据中的共复合物关系转化为二元相互作用。然后在得到的成对蛋白质-蛋白质相互作用(PPI)网络中挖掘被识别为假定蛋白质复合物的密集连接区域。将TAP数据转化为PPI数据不仅会引入误差,还会丢失有关潜在多蛋白关系的有用信息,而这些信息可用于检测蛋白质复合物的内部组织(即核心-附着结构)。在本文中,我们提出了一种名为CACHET的方法,该方法可直接从二分TAP数据中检测具有核心-附着结构的蛋白质复合物。CACHET将TAP数据建模为二分图,其中两个顶点集分别是诱饵和猎物。两个顶点集之间的边表示诱饵-猎物关系。CACHET首先专注于从二分图中检测高质量的蛋白质复合物核心。为了最小化假阳性相互作用的影响,诱饵-猎物关系用可靠性分数进行索引。只有从TAP二分图计算出的非冗余、可靠的双分子团被视为蛋白质复合物核心。CACHET通过将附着蛋白纳入核心来构建蛋白质复合物。我们将CACHET应用于大规模TAP数据集,发现CACHET在预测准确性(即预测复合物的F值和功能同质性)方面优于现有方法。此外,CACHET预测的蛋白质复合物具有核心-附着结构,这为蛋白质复合物的固有功能组织提供了有用的生物学见解。我们的补充材料可在http://www1.i2r.a-star.edu.sg/~xlli/CACHET/CACHET.htm找到;二进制可执行文件也可在那里找到。补充材料也可在www.liebertonline.com/cmb上获取。