Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.
Proteomics. 2013 Jan;13(2):269-77. doi: 10.1002/pmic.201200336. Epub 2012 Nov 29.
The identification of protein complexes plays a key role in understanding major cellular processes and biological functions. Various computational algorithms have been proposed to identify protein complexes from protein-protein interaction (PPI) networks. In this paper, we first introduce a new seed-selection strategy for seed-growth style algorithms. Cliques rather than individual vertices are employed as initial seeds. After that, a result-modification approach is proposed based on this seed-selection strategy. Predictions generated by higher order clique seeds are employed to modify results that are generated by lower order ones. The performance of this seed-selection strategy and the result-modification approach are tested by using the entropy-based algorithm, which is currently the best seed-growth style algorithm to detect protein complexes from PPI networks. In addition, we investigate four pairs of strategies for this algorithm in order to improve its accuracy. The numerical experiments are conducted on a Saccharomyces cerevisiae PPI network. The group of best predictions consists of 1711 clusters, with the average f-score at 0.68 after removing all similar and redundant clusters. We conclude that higher order clique seeds can generate predictions with higher accuracy and that our improved entropy-based algorithm outputs more reasonable predictions than the original one.
蛋白质复合物的鉴定在理解主要细胞过程和生物功能方面起着关键作用。已经提出了各种计算算法来从蛋白质-蛋白质相互作用(PPI)网络中鉴定蛋白质复合物。在本文中,我们首先为种子生长算法引入了一种新的种子选择策略。使用簇而不是单个顶点作为初始种子。之后,基于该种子选择策略提出了一种结果修正方法。使用基于熵的算法来测试该种子选择策略和结果修正方法的性能,该算法是目前从 PPI 网络中检测蛋白质复合物的最佳种子生长算法。此外,我们研究了该算法的四组策略,以提高其准确性。数值实验是在酿酒酵母 PPI 网络上进行的。最佳预测组包含 1711 个簇,去除所有相似和冗余簇后,平均 f 值为 0.68。我们得出结论,更高阶簇的种子可以生成更准确的预测,并且我们改进的基于熵的算法比原始算法输出更合理的预测。