Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China.
PLoS One. 2013 May 2;8(5):e62158. doi: 10.1371/journal.pone.0062158. Print 2013.
Detecting protein complexes from protein-protein interaction (PPI) networks is a challenging task in computational biology. A vast number of computational methods have been proposed to undertake this task. However, each computational method is developed to capture one aspect of the network. The performance of different methods on the same network can differ substantially, even the same method may have different performance on networks with different topological characteristic. The clustering result of each computational method can be regarded as a feature that describes the PPI network from one aspect. It is therefore desirable to utilize these features to produce a more accurate and reliable clustering. In this paper, a novel Bayesian Nonnegative Matrix Factorization (NMF)-based weighted Ensemble Clustering algorithm (EC-BNMF) is proposed to detect protein complexes from PPI networks. We first apply different computational algorithms on a PPI network to generate some base clustering results. Then we integrate these base clustering results into an ensemble PPI network, in the form of weighted combination. Finally, we identify overlapping protein complexes from this network by employing Bayesian NMF model. When generating an ensemble PPI network, EC-BNMF can automatically optimize the values of weights such that the ensemble algorithm can deliver better results. Experimental results on four PPI networks of Saccharomyces cerevisiae well verify the effectiveness of EC-BNMF in detecting protein complexes. EC-BNMF provides an effective way to integrate different clustering results for more accurate and reliable complex detection. Furthermore, EC-BNMF has a high degree of flexibility in the choice of base clustering results. It can be coupled with existing clustering methods to identify protein complexes.
从蛋白质-蛋白质相互作用(PPI)网络中检测蛋白质复合物是计算生物学中的一项具有挑战性的任务。已经提出了大量的计算方法来完成这项任务。然而,每种计算方法都是为了捕捉网络的一个方面而开发的。不同方法在同一网络上的性能可能有很大差异,即使同一方法在拓扑特征不同的网络上也可能有不同的性能。每个计算方法的聚类结果可以看作是从一个方面描述 PPI 网络的特征。因此,希望利用这些特征来产生更准确和可靠的聚类。在本文中,提出了一种新的基于贝叶斯非负矩阵分解(NMF)的加权集成聚类算法(EC-BNMF),用于从 PPI 网络中检测蛋白质复合物。我们首先将不同的计算算法应用于 PPI 网络,以生成一些基本的聚类结果。然后,我们将这些基本聚类结果集成到一个集成的 PPI 网络中,以加权组合的形式。最后,我们通过贝叶斯 NMF 模型从这个网络中识别重叠的蛋白质复合物。在生成集成 PPI 网络时,EC-BNMF 可以自动优化权重值,使集成算法能够产生更好的结果。在四个酿酒酵母 PPI 网络上的实验结果很好地验证了 EC-BNMF 在检测蛋白质复合物方面的有效性。EC-BNMF 为集成不同的聚类结果提供了一种有效的方法,以实现更准确和可靠的复合物检测。此外,EC-BNMF 在选择基本聚类结果方面具有很高的灵活性。它可以与现有的聚类方法结合使用,以识别蛋白质复合物。