Suppr超能文献

CMSA:一种用于多个相似RNA/DNA序列比对的异构CPU/GPU计算系统。

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

作者信息

Chen Xi, Wang Chen, Tang Shanjiang, Yu Ce, Zou Quan

机构信息

School of Computer Science and Technology, Tianjin University, Yaguan Road, Tianjin, China.

出版信息

BMC Bioinformatics. 2017 Jun 24;18(1):315. doi: 10.1186/s12859-017-1725-6.

Abstract

BACKGROUND

The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.

RESULTS

This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software.

CONCLUSION

CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .

摘要

背景

多序列比对(MSA)是生物信息学中用于序列分析的一种经典且强大的技术。随着生物数据集的快速增长,MSA并行化变得必要,以使其运行时间保持在可接受的水平。尽管在MSA问题上有很多工作,但它们的方法要么不够充分,要么包含一些隐含假设,限制了使用的通用性。首先,用户序列的信息,包括数据集的大小和序列的长度,可以是任意值,并且在提交之前通常是未知的,而不幸的是,先前的工作忽略了这些信息。其次,中心星策略适用于比对相似序列。但其第一阶段,即中心序列选择,非常耗时,需要进一步优化。此外,考虑到异构CPU/GPU平台,先前的研究仅考虑在GPU设备上进行MSA并行化,使得CPU在计算过程中处于空闲状态。然而,协同运行计算可以通过同时在CPU和GPU上进行工作负载计算来最大化计算资源的利用率。

结果

本文提出了CMSA,这是一种用于异构CPU/GPU平台上大规模数据集的强大且高效的MSA系统。它无需任何假设即可为用户提交的序列自动执行并优化多序列比对。CMSA采用协同运行计算模型,以便充分利用CPU和GPU设备。此外,CMSA提出了一种改进的中心星策略,将其中心序列选择过程的时间复杂度从O(mn)降低到O(mn)。实验结果表明,CMSA实现了高达11倍的加速比,并且优于现有软件。

结论

CMSA专注于多个相似RNA/DNA序列比对,并提出了一种基于新颖位图的算法来改进中心星策略。我们可以得出结论,利用现代GPU的高性能是加速多序列比对的一种有前途的方法。此外,采用协同运行计算模型可以显著最大化整个系统的利用率。源代码可在https://github.com/wangvsa/CMSA获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/796c/5483318/f386a61b92b1/12859_2017_1725_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验