Feng Jianxing, Jiang Rui, Jiang Tao
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Comput Syst Bioinformatics Conf. 2008;7:51-62.
The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.
高通量技术的出现带来了丰富的蛋白质-蛋白质相互作用(PPI)数据和微阵列基因表达谱,为使用计算方法识别新型蛋白质复合物提供了绝佳机会。尽管文献中已证明仅使用蛋白质-蛋白质相互作用数据的方法能够成功预测大量蛋白质复合物,但纳入基因表达谱有助于优化假定的复合物,从而提高计算方法的准确性。通过结合蛋白质-蛋白质相互作用数据和微阵列基因表达谱,我们提出了一种用于蛋白质复合物识别的新型图分割算法(GFA)。GFA改编自一种用于寻找(加权)最密集子图的经典最大流算法,它首先在蛋白质-蛋白质相互作用网络中找到大的(加权)密集子图,然后根据微阵列数据中节点对应的对数倍变化对节点进行适当加权,迭代地将每个这样的子图分解成片段,直到片段子图足够小。我们在三个广泛使用的蛋白质-蛋白质相互作用数据集上进行的大量测试以及与蛋白质复合物识别最新方法的比较表明,我们的方法在预测新型蛋白质复合物的准确性、效率和能力方面具有卓越性能。鉴于我们的方法已达到的高特异性(或精确性),我们推测我们的预测结果意味着超过200种新型蛋白质复合物。