Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey.
TOBB University of Economics & Technology, Sogutozu, Ankara, Turkey.
Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.
High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (i) it is implemented using quadratic-time dynamic programming algorithms and (ii) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before invoking computationally costly alignment algorithms.
We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10.
https://github.com/BilkentCompGen/GateKeeper.
mohammedalser@bilkent.edu.tr or onur.mutlu@inf.ethz.ch or calkan@cs.bilkent.edu.tr.
Supplementary data are available at Bioinformatics online.
高通量 DNA 测序 (HTS) 技术会产生大量的小 DNA 片段,称为短读段,这会带来巨大的计算负担。为了分析整个基因组,必须将数十亿个短读段中的每一个都根据读段与参考基因组中“候选”位置之间的相似性映射到参考基因组上。这种相似性测量,称为比对,其形式为近似字符串匹配问题,是计算的瓶颈,原因如下:(i) 它是使用二次时间动态规划算法实现的;(ii) 由于高度不相似,参考基因组中的大多数候选位置与给定的读段不匹配。计算这些不正确的候选位置的比对消耗了现代读段映射器执行时间的绝大多数。因此,开发一种快速有效的过滤器至关重要,该过滤器可以在调用计算成本高昂的比对算法之前检测到不正确的候选位置并将其剔除。
我们提出了 GateKeeper,这是一种新的硬件加速器,可以作为预比对步骤,快速过滤掉大多数不正确的候选位置。GateKeeper 是第一个使用现场可编程门阵列 (FPGA) 加速预比对的设计,它可以比软件更快地执行预比对。当在单个 FPGA 芯片上实现时,GateKeeper 保持了很高的准确性(平均 >96%),同时与最先进的软件预比对技术 Adjacency Filter 和 Shifted Hamming Distance (SHD) 相比,平均速度分别提高了 90 倍和 130 倍。将 GateKeeper 作为预比对步骤添加可以将 mrFAST 映射器的验证时间缩短 10 倍。
https://github.com/BilkentCompGen/GateKeeper。
mohammedalser@bilkent.edu.tr 或 onur.mutlu@inf.ethz.ch 或 calkan@cs.bilkent.edu.tr。
补充数据可在 Bioinformatics 在线获取。