使用 GPU 上的大规模并行计算和 CUDA 以及 ELLPACK-R 稀疏格式进行生物信息学中的快速并行马尔可夫聚类。

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format.

机构信息

Department of Mathematics, University of Indonesia, Depok 16424, Indonesia.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):679-92. doi: 10.1109/TCBB.2011.68.

DOI:10.1109/TCBB.2011.68

Abstract

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.

摘要

马尔可夫聚类 (MCL) 算法在生物信息学中已成为确定网络中聚类的关键算法。然而，随着生物网络上不断增加的大量数据，性能和可扩展性问题成为应用中的关键限制因素。同时，GPU 计算（利用 CUDA 工具在 GPU 卡上实现大规模并行计算环境）正成为一种非常强大、高效且低成本的选择，可以实现相对于 CPU 方法的显著性能提升。GPU 上的片上内存的使用有效地降低了延迟时间，从而避免了其他并行计算环境（如 MPI）中的一个主要问题。我们引入了一种非常快速的使用 CUDA 的马尔可夫聚类算法 (CUDA-MCL)，用于执行并行稀疏矩阵-矩阵计算和并行稀疏马尔可夫矩阵归一化，这是 MCL 的核心。我们利用 ELLPACK-R 稀疏格式来实现有效的细粒度大规模并行处理，以适应生物信息学应用中交互网络数据集的稀疏性质。正如结果所示，CUDA-MCL 比在 CPU 上运行的原始 MCL 快得多。因此，以前仅在超级计算架构上才能实现的基于现成台式机的大规模并行计算，可以极大地改变生物信息学家和生物学家处理数据的方式。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用 GPU 上的大规模并行计算和 CUDA 以及 ELLPACK-R 稀疏格式进行生物信息学中的快速并行马尔可夫聚类。

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format.

机构信息

出版信息

相似文献

引用本文的文献

使用 GPU 上的大规模并行计算和 CUDA 以及 ELLPACK-R 稀疏格式进行生物信息学中的快速并行马尔可夫聚类。

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format.

机构信息

出版信息

相似文献

引用本文的文献