Greene Derek, Cagney Gerard, Krogan Nevan, Cunningham Pádraig
School of Computer Science and Informatics, University College Dublin, Dublin, Ireland.
Bioinformatics. 2008 Aug 1;24(15):1722-8. doi: 10.1093/bioinformatics/btn286. Epub 2008 Jun 12.
When working with large-scale protein interaction data, an important analysis task is the assignment of pairs of proteins to groups that correspond to higher order assemblies. Previously a common approach to this problem has been to apply standard hierarchical clustering methods to identify such a groups. Here we propose a new algorithm for aggregating a diverse collection of matrix factorizations to produce a more informative clustering, which takes the form of a 'soft' hierarchy of clusters.
We apply the proposed Ensemble non-negative matrix factorization (NMF) algorithm to a high-quality assembly of binary protein interactions derived from two proteome-wide studies in yeast. Our experimental evaluation demonstrates that the algorithm lends itself to discovering small localized structures in this data, which correspond to known functional groupings of complexes. In addition, we show that the algorithm also supports the assignment of putative functions for previously uncharacterized proteins, for instance the protein YNR024W, which may be an uncharacterized component of the exosome.
在处理大规模蛋白质相互作用数据时,一项重要的分析任务是将蛋白质对分配到与高阶组装相对应的组中。以前,解决这个问题的常用方法是应用标准层次聚类方法来识别这样的组。在这里,我们提出了一种新算法,用于聚合各种矩阵分解以产生更具信息性的聚类,其形式为聚类的“软”层次结构。
我们将提出的集成非负矩阵分解(NMF)算法应用于从酵母的两项全蛋白质组研究中获得的高质量二元蛋白质相互作用组装。我们的实验评估表明,该算法有助于在这些数据中发现小的局部结构,这些结构对应于已知的复合物功能分组。此外,我们表明该算法还支持为以前未表征的蛋白质指定推定功能,例如蛋白质YNR024W,它可能是外泌体的未表征成分。