Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
Mol Cell Proteomics. 2021;20:100002. doi: 10.1074/mcp.RA120.002275. Epub 2020 Nov 24.
Biological functions emerge from complex and dynamic networks of protein-protein interactions. Because these protein-protein interaction networks, or interactomes, represent pairwise connections within a hierarchically organized system, it is often useful to identify higher-order associations embedded within them, such as multimember protein complexes. Graph-based clustering techniques are widely used to accomplish this goal, and dozens of field-specific and general clustering algorithms exist. However, interactomes can be prone to errors, especially when inferred from high-throughput biochemical assays. Therefore, robustness to network-level noise is an important criterion. Here, we tested the robustness of a range of graph-based clustering algorithms in the presence of noise, including algorithms common across domains and those specific to protein networks. Strikingly, we found that all of the clustering algorithms tested here markedly amplified network-level noise. Randomly rewiring only 1% of network edges yielded more than a 50% change in clustering results. Moreover, we found the impact of network noise on individual clusters was not uniform: some clusters were consistently robust to injected noise, whereas others were not. Therefore we developed the clust.perturb R package and Shiny web application to measure the reproducibility of clusters by randomly perturbing the network. We show that clust.perturb results are predictive of real-world cluster stability: poorly reproducible clusters as identified by clust.perturb are significantly less likely to be reclustered across experiments. We conclude that graph-based clustering amplifies noise in protein interaction networks, but quantifying the robustness of a cluster to network noise can separate stable protein complexes from spurious associations.
生物功能源自于蛋白质-蛋白质相互作用的复杂和动态网络。由于这些蛋白质-蛋白质相互作用网络(即互作组)代表了分层组织系统内的两两连接,因此通常有用的是识别其中嵌入的更高阶关联,例如多成员蛋白质复合物。基于图的聚类技术被广泛用于实现这一目标,并且存在数十种特定于领域和通用的聚类算法。然而,互作组可能容易出错,特别是当从高通量生化测定中推断出来时。因此,对网络级噪声的鲁棒性是一个重要的标准。在这里,我们在存在噪声的情况下测试了一系列基于图的聚类算法的鲁棒性,包括跨领域和特定于蛋白质网络的算法。引人注目的是,我们发现这里测试的所有聚类算法都明显放大了网络级噪声。随机重连网络边缘的 1%,聚类结果就会发生超过 50%的变化。此外,我们发现网络噪声对单个簇的影响并不均匀:一些簇始终对注入的噪声具有鲁棒性,而其他簇则不然。因此,我们开发了 clust.perturb R 包和 Shiny 网络应用程序,通过随机扰动网络来测量簇的可重复性。我们表明,clust.perturb 的结果可预测真实世界的簇稳定性:clust.perturb 识别的可重复性差的簇在跨实验中重新聚类的可能性显著降低。我们得出结论,基于图的聚类放大了蛋白质相互作用网络中的噪声,但量化簇对网络噪声的鲁棒性可以将稳定的蛋白质复合物与虚假关联区分开来。