IET Syst Biol. 2013 Oct;7(5):223-30. doi: 10.1049/iet-syb.2012.0052.
Protein complexes are a cornerstone of many biological processes. Protein-protein interaction (PPI) data enable a number of computational methods for predicting protein complexes. However, the insufficiency of the PPI data significantly lowers the accuracy of computational methods. In the current work, the authors develop a novel method named clustering based on multiple biological information (CMBI) to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient and the Pearson correlation coefficient. Second, CMBI selects essential proteins as seeds to build the protein complexes. A redundancy-filtering procedure is performed to eliminate redundant complexes. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the authors compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. The authors use subsequently GO::TermFinder to analyse the complexes predicted by various methods. Finally, the effect of parameters T and R is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs significantly better than the state-of-the-art methods.
蛋白质复合物是许多生物过程的基石。蛋白质-蛋白质相互作用(PPI)数据支持了许多用于预测蛋白质复合物的计算方法。然而,PPI 数据的不足显著降低了计算方法的准确性。在当前的工作中,作者开发了一种名为基于多种生物信息的聚类(CMBI)的新方法,通过整合包括基因表达谱、必需蛋白质信息和 PPI 数据在内的多种生物资源来发现蛋白质复合物。首先,CMBI 根据边聚类系数和皮尔逊相关系数定义了每对相互作用蛋白质的功能相似性。其次,CMBI 选择必需蛋白质作为种子来构建蛋白质复合物。通过冗余过滤过程来消除冗余复合物。除了必需蛋白质之外,CMBI 还使用其他蛋白质作为种子来扩展蛋白质复合物。为了检查 CMBI 的性能,作者通过将预测的复合物与参考复合物进行匹配,将 CMBI 发现的复合物与其他技术发现的复合物进行比较。作者随后使用 GO::TermFinder 来分析各种方法预测的复合物。最后,研究了参数 T 和 R 的影响。GO 功能富集和匹配分析的结果表明,CMBI 的性能明显优于最先进的方法。