Suppr超能文献

用于蛋白质相互作用图划分的马尔可夫聚类与亲和传播算法

Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

作者信息

Vlasblom James, Wodak Shoshana J

机构信息

Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario, Canada.

出版信息

BMC Bioinformatics. 2009 Mar 30;10:99. doi: 10.1186/1471-2105-10-99.

Abstract

BACKGROUND

Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.

RESULTS

In this work we compare the performance of the Affinity Propagation (AP) and Markov Clustering (MCL) procedures. To this end we derive an unweighted network of protein-protein interactions from a set of 408 protein complexes from S. cervisiae hand curated in-house, and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise. To evaluate the performance on a weighted protein interaction graph, we also apply the two algorithms to the consolidated protein interaction network of S. cerevisiae, derived from genome scale purification experiments and to versions of this network in which varying proportions of the links have been randomly shuffled.

CONCLUSION

Our analysis shows that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.

摘要

背景

蛋白质相互作用的基因组规模数据通常表示为大型网络或图,其中数百或数千个蛋白质相互连接。由于蛋白质倾向于以组或复合物的形式发挥作用,一个重要目标是从这些图中可靠地识别蛋白质复合物。这项任务通常使用聚类程序来执行,其目的是检测相互作用图中紧密连接的区域。存在大量聚类算法,其中一些已应用于此问题。在这种情况下,最成功的聚类程序之一是马尔可夫聚类算法(MCL),最近它被证明优于许多其他程序,其中一些是专门为划分蛋白质相互作用图而设计的。一种名为亲和传播(AP)的新型有前途的聚类程序最近被证明特别有效,并且在解决各种问题时比其他方法快得多,但尚未应用于划分蛋白质相互作用图。

结果

在这项工作中,我们比较了亲和传播(AP)和马尔可夫聚类(MCL)程序的性能。为此,我们从内部手工整理的一组408个酿酒酵母蛋白质复合物中导出了一个未加权的蛋白质 - 蛋白质相互作用网络,并评估了这两种聚类算法在召回注释复合物方面的性能。在此过程中,对每种算法的参数空间进行采样,以便为这些参数选择最佳值,并通过量化在网络中随机添加或删除相互作用以模拟噪声时复合物召回的水平来评估算法的稳健性。为了评估在加权蛋白质相互作用图上的性能,我们还将这两种算法应用于酿酒酵母的整合蛋白质相互作用网络,该网络源自基因组规模的纯化实验以及该网络中不同比例的链接已被随机打乱的版本。

结论

我们的分析表明,MCL程序对噪声的耐受性明显更高,并且比AP算法表现得更稳健。对于未加权的蛋白质相互作用图,MCL相对于AP的优势非常明显,因为AP在我们测试的大多数未加权图版本上都表现出严重的收敛问题,而随着图中噪声水平的增加,MCL仍然能够识别有意义的聚类,尽管数量较少。因此,MCL仍然是从二元相互作用网络中识别蛋白质复合物的首选方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b14/2682798/0332cc643624/1471-2105-10-99-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验