Suppr超能文献

用于蛋白质复合物和蛋白质相互作用网络组装的聚类算法评估。

Evaluation of clustering algorithms for protein complex and protein interaction network assembly.

作者信息

Sardiu Mihaela E, Florens Laurence, Washburn Michael P

机构信息

Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA.

出版信息

J Proteome Res. 2009 Jun;8(6):2944-52. doi: 10.1021/pr900073d.

Abstract

Assembling protein complexes and protein interaction networks from affinity purification-based proteomics data sets remains a challenge. When little a priori knowledge of the complexes exists, it is difficult to place proteins in the proper locations and evaluate the results of clustering approaches. Here we have systematically compared multiple hierarchical and partitioning clustering approaches using a well-characterized but highly complex human protein interaction network data set centered around the conserved AAA+ ATPases Tip49a and Tip49b. This network provides a challenge to clustering algorithms because Tip49a and Tip49b are present in four distinct complexes, the network contains modules, and the network has multiple attachments. We compared the use of binary data, quantitative proteomics data in the form of normalized spectral abundance factors, and the Z-score normalization. In our analysis, a partitioning approach indicated the major modules in a network. Next, while Euclidian distance was sensitive to scaling, with data transformation, all the attachments in a data set were recovered in one branch of a dendrogram. Finally, when Pearson correlation and hierarchical clustering were used, complexes were well separated and their attachments were placed in the proper locations. Each of these three approaches provided distinct information useful for assembly of a network of multiple protein complexes.

摘要

从基于亲和纯化的蛋白质组学数据集中组装蛋白质复合物和蛋白质相互作用网络仍然是一项挑战。当对复合物的先验知识很少时,很难将蛋白质放置在合适的位置并评估聚类方法的结果。在这里,我们使用了一个以保守的AAA+ATP酶Tip49a和Tip49b为中心的特征明确但高度复杂的人类蛋白质相互作用网络数据集,系统地比较了多种层次聚类和划分聚类方法。这个网络对聚类算法提出了挑战,因为Tip49a和Tip49b存在于四个不同的复合物中,网络包含模块,并且网络有多个附属物。我们比较了二进制数据、以标准化光谱丰度因子形式的定量蛋白质组学数据以及Z分数标准化的使用情况。在我们的分析中,一种划分方法指出了网络中的主要模块。接下来,虽然欧几里得距离对缩放敏感,但通过数据转换,数据集中的所有附属物都在树状图的一个分支中被恢复。最后,当使用皮尔逊相关和层次聚类时,复合物被很好地分离,并且它们的附属物被放置在合适的位置。这三种方法中的每一种都提供了对组装多个蛋白质复合物网络有用的独特信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验