Wu Min, Ou-Yang Le, Li Xiao-Li
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):733-739. doi: 10.1109/TCBB.2016.2552176. Epub 2016 Apr 8.
With the increasing availability of protein interaction data, various computational methods have been developed to predict protein complexes. However, different computational methods may have their own advantages and limitations. Ensemble clustering has thus been studied to minimize the potential bias and risk of individual methods and generate prediction results with better coverage and accuracy. In this paper, we extend the traditional ensemble clustering by taking into account the co-complex affinity scores and present an Ensemble H ierarchical Clustering framework (EnsemHC) to detect protein complexes. First, we construct co-cluster matrices by integrating the clustering results with the co-complex evidences. Second, we sum up the constructed co-cluster matrices to derive a final ensemble matrix via a novel iterative weighting scheme. Finally, we apply the hierarchical clustering to generate protein complexes from the final ensemble matrix. Experimental results demonstrate that our EnsemHC performs better than its base clustering methods and various existing integrative methods. In addition, we also observed that integrating the clusters and co-complex affinity scores from different data sources will improve the prediction performance, e.g., integrating the clusters from TAP data and co-complex affinities from binary PPI data achieved the best performance in our experiments.
随着蛋白质相互作用数据的日益丰富,人们开发了各种计算方法来预测蛋白质复合物。然而,不同的计算方法可能各有优缺点。因此,人们研究了集成聚类,以尽量减少单个方法的潜在偏差和风险,并生成具有更好覆盖范围和准确性的预测结果。在本文中,我们通过考虑共复合物亲和力得分扩展了传统的集成聚类,并提出了一种集成层次聚类框架(EnsemHC)来检测蛋白质复合物。首先,我们通过将聚类结果与共复合物证据相结合来构建共聚类矩阵。其次,我们通过一种新颖的迭代加权方案对构建的共聚类矩阵求和,以得到最终的集成矩阵。最后,我们应用层次聚类从最终的集成矩阵中生成蛋白质复合物。实验结果表明,我们的EnsemHC比其基础聚类方法和各种现有的综合方法表现更好。此外,我们还观察到,整合来自不同数据源的聚类和共复合物亲和力得分将提高预测性能,例如,在我们的实验中,整合来自TAP数据的聚类和来自二元PPI数据的共复合物亲和力获得了最佳性能。