Computer Science Department, ETH Zürich, Zürich, Switzerland.
Chair for Processor Design, Center For Advancing Electronics Dresden, Institute of Computer Engineering, Technische Universität Dresden, Dresden, Germany.
Bioinformatics. 2019 Nov 1;35(21):4255-4263. doi: 10.1093/bioinformatics/btz234.
The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm.
Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step.
https://github.com/CMU-SAFARI/Shouji.
Supplementary data are available at Bioinformatics online.
生成大量测序数据的能力继续超过现有算法和计算基础设施的处理能力。在这项工作中,我们探索使用硬件/软件协同设计和硬件加速来显著减少短序列比对的执行时间,这是分析测序基因组的关键步骤。我们引入了 Shouji,这是一种高度并行且准确的预对齐过滤器,可以显著减少对计算成本高昂的动态规划算法的需求。我们提出的预对齐过滤器的第一个关键思想是通过正确检测两个给定序列之间共享的所有常见子序列来提供高过滤精度。第二个关键思想是设计一个硬件加速器,采用现代现场可编程门阵列 (FPGA) 架构进一步提高我们算法的性能。
Shouji 与最先进的预对齐过滤器 GateKeeper 和 SHD 相比,显著提高了预对齐过滤的准确性,高达两个数量级。我们基于 FPGA 的加速器比 Shouji 的等效 CPU 实现快三个数量级。使用单个 FPGA 芯片,我们对将 Shouji 与为不同计算平台设计的五种最先进的序列对齐器集成的好处进行了基准测试。将 Shouji 作为预对齐步骤添加可将这五种最先进的序列对齐器的执行时间缩短高达 18.8 倍。Shouji 可以适应执行验证的任何生物信息学管道中的序列对齐。与旨在加速序列对齐的大多数现有方法不同,Shouji 不会牺牲对齐器的任何功能,因为它不会修改或替换对齐步骤。
https://github.com/CMU-SAFARI/Shouji。
补充数据可在 Bioinformatics 在线获取。