Department of Computer Science, University of California, Santa Barbara, CA 93106, USA.
BMC Bioinformatics. 2009 Sep 9;10:283. doi: 10.1186/1471-2105-10-283.
We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins.
We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values.
RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.
我们提出了一种基于重复随机游走(RRW)的高效且敏感的生物学算法,用于在大规模蛋白质网络中发现功能模块,例如复合物和途径。与现有的聚类识别技术相比,RRW 隐含地利用了网络拓扑、边权重以及蛋白质之间的远程相互作用。
我们将提出的技术应用于酵母基因的功能网络中,准确地识别出具有统计学意义的蛋白质簇。我们使用 MIPS 复合物目录数据库中的已知复合物和经过充分表征的生物学过程来验证结果的生物学意义。我们发现,创建的簇中有 90%的簇的大多数目录蛋白属于同一个 MIPS 复合物,大约 80%的簇的大多数蛋白都参与了同一个生物学过程。我们将我们的方法与其他各种聚类技术(如 Markov 聚类算法(MCL))进行比较,发现 RRW 聚类在精度和准确性方面有显著提高。
RRW 是一种利用网络拓扑结构的技术,在发现局部簇方面更精确、更稳健。此外,它还具有通过允许重叠簇来发现多功能蛋白的灵活性。