Alser Mohammed, Shahroodi Taha, Gómez-Luna Juan, Alkan Can, Mutlu Onur
Department of Computer Science, ETH Zurich, Zurich 8006, Switzerland.
Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8006, Switzerland.
Bioinformatics. 2021 Apr 1;36(22-23):5282-5290. doi: 10.1093/bioinformatics/btaa1015.
We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that connects two terminals with the least routing cost on a special grid layout that contains obstacles. The SneakySnake algorithm quickly solves the SNR problem and uses the found optimal path to decide whether or not performing sequence alignment is necessary. Reducing the ASM problem into SNR also makes SneakySnake efficient to implement on CPUs, GPUs and FPGAs.
SneakySnake significantly improves the accuracy of pre-alignment filtering by up to four orders of magnitude compared to the state-of-the-art pre-alignment filters, Shouji, GateKeeper and SHD. For short sequences, SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a configurable scoring function), by up to 37.7× and 43.9× (>12× on average), respectively, with its CPU implementation, and by up to 413× and 689× (>400× on average), respectively, with FPGA and GPU acceleration. For long sequences, the CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence aligner of minimap2) by up to 979× (276.9× on average) and 91.7× (31.7× on average), respectively. As SneakySnake does not replace sequence alignment, users can still obtain all capabilities (e.g. configurable scoring functions) of the aligner of their choice, unlike existing acceleration efforts that sacrifice some aligner capabilities.
https://github.com/CMU-SAFARI/SneakySnake.
Supplementary data are available at Bioinformatics online.
我们引入了SneakySnake,这是一种高度并行且高度准确的预比对过滤器,能显著减少对计算成本高昂的序列比对的需求。SneakySnake的关键思想是将近似字符串匹配(ASM)问题简化为超大规模集成电路(VLSI)芯片布局中的单网路由(SNR)问题。在SNR问题中,我们关注的是在包含障碍物的特殊网格布局上找到连接两个终端且路由成本最低的最优路径。SneakySnake算法能快速解决SNR问题,并利用找到的最优路径来决定是否有必要进行序列比对。将ASM问题简化为SNR问题也使得SneakySnake能够在CPU、GPU和FPGA上高效实现。
与当前最先进的预比对过滤器Shouji、GateKeeper和SHD相比,SneakySnake将预比对过滤的准确率显著提高了多达四个数量级。对于短序列,SneakySnake的CPU实现分别将Edlib(Myers比特向量算法的当前最先进实现)和Parasail(具有可配置评分函数的当前最先进序列比对器)加速了多达37.7倍和43.9倍(平均>12倍),而通过FPGA和GPU加速时分别高达413倍和689倍(平均>400倍)。对于长序列,SneakySnake的CPU实现分别将Parasail和KSW2(minimap2的序列比对器)加速了多达979倍(平均276.9倍)和91.7倍(平均31.7倍)。由于SneakySnake并不替代序列比对,与现有的牺牲某些比对器功能的加速方法不同,用户仍然可以获得他们选择的比对器的所有功能(例如可配置评分函数)。
https://github.com/CMU - SAFARI/SneakySnake。
补充数据可在《生物信息学》在线获取。