Suppr超能文献

实现两个和三个序列的史密斯-沃特曼比对算法的并行平铺码

Parallel Tiled Codes Implementing the Smith-Waterman Alignment Algorithm for Two and Three Sequences.

作者信息

Palkowski Marek, Bielecki Wlodzimierz

机构信息

Faculty of Computer Science, West Pomeranian University of Technology , Szczecin, Poland .

出版信息

J Comput Biol. 2018 Oct;25(10):1106-1119. doi: 10.1089/cmb.2018.0006. Epub 2018 Jul 11.

Abstract

The Smith-Waterman (SW) algorithm explores all the possible alignments between two or more sequences and as a result it returns the optimal local alignment. However, the computational cost of this algorithm is very high, and the exponential growth of computation makes SW unrealistic for searching similarities in large sets of sequences. Fortunately, the dynamic programming kernel of the SW algorithm involves mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply polyhedral compilation techniques to optimize the studied SW dense array code. In this article, we present an approach to generate efficient SW implementations for two and three sequences by using the transitive closure of a dependence graph and loop skewing. Generated programs are represented with parallel tiled loop nests, which expose significantly higher performance than that of programs obtained with closely related compilers. The approach is able to tile all loops of original loop nests as opposed to well-known affine transformation techniques. Furthermore, it allows for code optimization of three-sequence alignment. Such a code cannot be generated by means of state-of-the-art automatic optimizing compilers. We demonstrate that an under-approximation of transitive closure (instead of exact transitive closure) can be used to generate valid parallel tiled code. This considerably reduces the computational complexity of the approach. Generated codes were run on cores of a modern Intel multiprocessor and they expose high speedup and good scalability on this platform.

摘要

史密斯-沃特曼(SW)算法会探索两个或多个序列之间所有可能的比对,因此它会返回最优的局部比对结果。然而,该算法的计算成本非常高,而且计算量的指数级增长使得SW算法在搜索大量序列集的相似性时不切实际。幸运的是,SW算法的动态规划内核涉及仿射控制循环上的数学运算,其迭代空间可以用多面体模型来表示。这使我们能够应用多面体编译技术来优化所研究的SW密集数组代码。在本文中,我们提出了一种方法,通过使用依赖图的传递闭包和循环偏斜来生成针对两个和三个序列的高效SW实现。生成的程序用并行平铺循环嵌套来表示,其性能比使用密切相关的编译器获得的程序显著更高。与众所周知的仿射变换技术不同,该方法能够平铺原始循环嵌套的所有循环。此外,它还允许对三序列比对进行代码优化。这种代码无法通过最先进的自动优化编译器生成。我们证明,可以使用传递闭包的欠近似(而不是精确传递闭包)来生成有效的并行平铺代码。这大大降低了该方法的计算复杂度。生成的代码在现代英特尔多处理器的核心上运行,在该平台上展现出了高加速比和良好的可扩展性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验