Xu Ying, Zhou Jiaogen, Zhou Shuigeng, Guan Jihong
Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China.
The institute of subtropical Agriculture, China Academy of Sciences, 444 Yuandaer Road, Mapoling, Changsha, 410125, China.
BMC Syst Biol. 2017 Dec 21;11(Suppl 7):135. doi: 10.1186/s12918-017-0504-3.
Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems.
In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained. We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall.
CPredictor3.0 can serve as a promising tool of protein complex prediction.
有效预测蛋白质复合物不仅有助于理解蛋白质及其复合物的结构和功能,还对疾病诊断和新药研发有用。到目前为止,已经开发了许多方法通过从静态蛋白质 - 蛋白质相互作用(PPI)网络中挖掘密集子图来检测复合物,却忽略了其他生物信息的价值以及细胞系统的动态特性。
在本文中,基于我们之前的工作CPredictor和CPredictor2.0,我们提出了一种新的方法,用于从具有基因表达数据和蛋白质功能注释的PPI网络中预测复合物,称为CPredictor3.0。这种新方法遵循这样的观点,即同一复合物中的蛋白质应该大致具有相似的功能,并且在细胞系统中的同一时间和地点具有活性。我们首先通过使用不同时间点的基因表达数据检测活性蛋白质,并分别使用基因本体(GO)功能注释对蛋白质进行聚类。然后,对于每个时间点,我们将对应于从表达数据生成的活性蛋白质的一组与对应于从功能注释生成的蛋白质簇的另一组进行集合交集运算。每个得到的唯一集合表示一组具有相似功能且在该时间点具有活性的蛋白质。接着,我们将每一组具有相似功能的活性蛋白质映射到一个静态PPI网络上,并得到一系列诱导连通子图。我们将这些子图视为候选复合物。最后,通过扩展和合并这些候选复合物,得到预测的复合物。我们在几个PPI网络和基准复合物数据集上评估了CPredictor3.0,并将其与许多现有方法进行比较。实验结果表明,CPredictor3.0实现了最高的F1值,这表明CPredictor3.0在总体上优于这些现有方法。
CPredictor3.0可以作为一种有前景的蛋白质复合物预测工具。