Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8006, Switzerland.
Bionano Genomics, San Diego, CA 92121, United States.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad151.
Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work.
We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1×, 1.7×, and 2.1× speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0×, 80.4×, 6.8×, 12.6×, and 5.9× speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6× less chip area and 2.1× less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge.
在常见的生物信息学管道中,成对序列比对是一个非常耗时的步骤。加快这一步需要启发式方法、高效的实现和/或硬件加速。最近提出的 GenASM 算法是所有这些的有希望的候选者。我们确定并解决了 GenASM 算法中的三个效率低下的问题:它的数据移动量很大,内存占用很大,并且做了一些不必要的工作。
我们提出了 Scrooge,一种快速且节省内存的基因组序列比对器。Scrooge 包括三个新颖的算法改进,这些改进减少了 GenASM 算法中的数据移动、内存占用和操作数量。我们为 CPU 和 GPU 提供了 Scrooge 算法的高效开源实现,这些实现证明了我们算法改进的显著优势。对于长读取,Scrooge 的 CPU 版本分别比 KSW2、Edlib 和 GenASM 的 CPU 实现快 20.1 倍、1.7 倍和 2.1 倍。Scrooge 的 GPU 版本分别比 Scrooge 的 CPU 版本、KSW2、Edlib、Darwin-GPU 和 GenASM 的 GPU 实现快 4.0 倍、80.4 倍、6.8 倍、12.6 倍和 5.9 倍。我们估计 Scrooge 的 ASIC 实现使用的芯片面积比 GenASM 的 ASIC 少 3.6 倍,功耗少 2.1 倍,而吞吐量保持不变。此外,我们系统地分析了各种配置下 GenASM 和 Scrooge 的吞吐量和准确性行为。由于 Scrooge 的最佳配置取决于计算平台,因此我们提出了一些可以帮助指导未来 Scrooge 实现的观察结果。