Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA.
Bioinformatics. 2010 Apr 15;26(8):1105-11. doi: 10.1093/bioinformatics/btq078. Epub 2010 Feb 24.
Clustering algorithms play an important role in the analysis of biological networks, and can be used to uncover functional modules and obtain hints about cellular organization. While most available clustering algorithms work well on biological networks of moderate size, such as the yeast protein physical interaction network, they either fail or are too slow in practice for larger networks, such as functional networks for higher eukaryotes. Since an increasing number of larger biological networks are being determined, the limitations of current clustering approaches curtail the types of biological network analyses that can be performed.
We present a fast local network clustering algorithm SPICi. SPICi runs in time O(V log V+E) and space O(E), where V and E are the number of vertices and edges in the network, respectively. We evaluate SPICi's performance on several existing protein interaction networks of varying size, and compare SPICi to nine previous approaches for clustering biological networks. We show that SPICi is typically several orders of magnitude faster than previous approaches and is the only one that can successfully cluster all test networks within very short time. We demonstrate that SPICi has state-of-the-art performance with respect to the quality of the clusters it uncovers, as judged by its ability to recapitulate protein complexes and functional modules. Finally, we demonstrate the power of our fast network clustering algorithm by applying SPICi across hundreds of large context-specific human networks, and identifying modules specific for single conditions.
Source code is available under the GNU Public License at http://compbio.cs.princeton.edu/spici.
聚类算法在生物网络分析中起着重要作用,可用于发现功能模块并获得有关细胞组织的提示。虽然大多数可用的聚类算法在中等规模的生物网络(如酵母蛋白质物理相互作用网络)上运行良好,但对于更大的网络(如高等真核生物的功能网络),它们要么无法正常工作,要么在实践中速度太慢。由于越来越多的更大的生物网络正在被确定,当前聚类方法的局限性限制了可以进行的生物网络分析类型。
我们提出了一种快速的局部网络聚类算法 SPICi。SPICi 的运行时间为 O(VlogV+E),空间复杂度为 O(E),其中 V 和 E 分别是网络中的顶点数和边数。我们在几个大小不同的现有蛋白质相互作用网络上评估了 SPICi 的性能,并将 SPICi 与之前用于聚类生物网络的九种方法进行了比较。我们表明,SPICi 通常比以前的方法快几个数量级,并且是唯一能够在非常短的时间内成功聚类所有测试网络的方法。我们证明,SPICi 在其发现的聚类质量方面具有最先进的性能,这可以通过其重新生成蛋白质复合物和功能模块的能力来判断。最后,我们通过在数百个大型特定于上下文的人类网络上应用 SPICi 并识别特定于单个条件的模块,展示了我们快速网络聚类算法的强大功能。
源代码可在 GNU 公共许可证下从 http://compbio.cs.princeton.edu/spici 获得。