Mete Mutlu, Tang Fusheng, Xu Xiaowei, Yuruk Nurcan
Department of Applied Science, University of Arkansas at Little Rock, Little Rock, Arkansas, USA.
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S19. doi: 10.1186/1471-2105-9-S9-S19.
Biological systems can be modeled as complex network systems with many interactions between the components. These interactions give rise to the function and behavior of that system. For example, the protein-protein interaction network is the physical basis of multiple cellular functions. One goal of emerging systems biology is to analyze very large complex biological networks such as protein-protein interaction networks, metabolic networks, and regulatory networks to identify functional modules and assign functions to certain components of the system. Network modules do not occur by chance, so identification of modules is likely to capture the biologically meaningful interactions in large-scale PPI data. Unfortunately, existing computer-based clustering methods developed to find those modules are either not so accurate or too slow.
We devised a new methodology called SCAN (Structural Clustering Algorithm for Networks) that can efficiently find clusters or functional modules in complex biological networks as well as hubs and outliers. More specifically, we demonstrated that we can find functional modules in complex networks and classify nodes into various roles based on their structures. In this study, we showed the effectiveness of our methodology using the budding yeast (Saccharomyces cerevisiae) protein-protein interaction network. To validate our clustering results, we compared our clusters with the known functions of each protein. Our predicted functional modules achieved very high purity comparing with state-of-the-art approaches. Additionally the theoretical and empirical analysis demonstrated a linear running-time of the algorithm, which is the fastest approach for networks.
We compare our algorithm with well-known modularity based clustering algorithm CNM. We successfully detect functional groups that are annotated with putative GO terms. Top-10 clusters with minimum p-value theoretically prove that newly proposed algorithm partitions network more accurately then CNM. Furthermore, manual interpretations of functional groups found by SCAN show superior performance over CNM.
生物系统可被建模为组件间存在众多相互作用的复杂网络系统。这些相互作用产生了该系统的功能和行为。例如,蛋白质 - 蛋白质相互作用网络是多种细胞功能的物理基础。新兴系统生物学的一个目标是分析非常大的复杂生物网络,如蛋白质 - 蛋白质相互作用网络、代谢网络和调控网络,以识别功能模块并为系统的某些组件赋予功能。网络模块并非偶然出现,因此识别模块可能会捕捉大规模蛋白质 - 蛋白质相互作用(PPI)数据中有生物学意义的相互作用。不幸的是,为找到这些模块而开发的现有基于计算机的聚类方法要么不够准确,要么速度太慢。
我们设计了一种名为SCAN(网络结构聚类算法)的新方法,它能够在复杂生物网络中高效地找到聚类或功能模块以及中心节点和离群点。更具体地说,我们证明了我们可以在复杂网络中找到功能模块,并根据节点的结构将其分类为各种角色。在本研究中,我们使用芽殖酵母(酿酒酵母)蛋白质 - 蛋白质相互作用网络展示了我们方法的有效性。为了验证我们的聚类结果,我们将我们的聚类与每种蛋白质的已知功能进行了比较。与现有最先进的方法相比,我们预测的功能模块具有非常高的纯度。此外,理论和实证分析表明该算法具有线性运行时间,这是处理网络的最快方法。
我们将我们的算法与著名的基于模块度的聚类算法CNM进行了比较。我们成功检测到了用假定的基因本体(GO)术语注释的功能组。理论上具有最小p值的前10个聚类证明新提出的算法比CNM更准确地划分网络。此外,对SCAN找到的功能组的人工解释显示出比CNM更好的性能。