狩集：一种快速高效的序列比对预对齐滤波器。

Shouji: a fast and efficient pre-alignment filter for sequence alignment.

机构信息

Computer Science Department, ETH Zürich, Zürich, Switzerland.

Chair for Processor Design, Center For Advancing Electronics Dresden, Institute of Computer Engineering, Technische Universität Dresden, Dresden, Germany.

出版信息

Bioinformatics. 2019 Nov 1;35(21):4255-4263. doi: 10.1093/bioinformatics/btz234.

DOI:10.1093/bioinformatics/btz234

PMID:30923804

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6821304/

Abstract

MOTIVATION

The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm.

RESULTS

Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step.

AVAILABILITY AND IMPLEMENTATION

https://github.com/CMU-SAFARI/Shouji.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生成大量测序数据的能力继续超过现有算法和计算基础设施的处理能力。在这项工作中，我们探索使用硬件/软件协同设计和硬件加速来显著减少短序列比对的执行时间，这是分析测序基因组的关键步骤。我们引入了 Shouji，这是一种高度并行且准确的预对齐过滤器，可以显著减少对计算成本高昂的动态规划算法的需求。我们提出的预对齐过滤器的第一个关键思想是通过正确检测两个给定序列之间共享的所有常见子序列来提供高过滤精度。第二个关键思想是设计一个硬件加速器，采用现代现场可编程门阵列 (FPGA) 架构进一步提高我们算法的性能。

结果

Shouji 与最先进的预对齐过滤器 GateKeeper 和 SHD 相比，显著提高了预对齐过滤的准确性，高达两个数量级。我们基于 FPGA 的加速器比 Shouji 的等效 CPU 实现快三个数量级。使用单个 FPGA 芯片，我们对将 Shouji 与为不同计算平台设计的五种最先进的序列对齐器集成的好处进行了基准测试。将 Shouji 作为预对齐步骤添加可将这五种最先进的序列对齐器的执行时间缩短高达 18.8 倍。Shouji 可以适应执行验证的任何生物信息学管道中的序列对齐。与旨在加速序列对齐的大多数现有方法不同，Shouji 不会牺牲对齐器的任何功能，因为它不会修改或替换对齐步骤。

可用性和实现

https://github.com/CMU-SAFARI/Shouji。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

Shouji: a fast and efficient pre-alignment filter for sequence alignment.狩集：一种快速高效的序列比对预对齐滤波器。

Bioinformatics. 2019 Nov 1;35(21):4255-4263. doi: 10.1093/bioinformatics/btz234.

SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs.SneakySnake：一种适用于CPU、GPU和FPGA的快速且准确的通用基因组预比对过滤器。

Bioinformatics. 2021 Apr 1;36(22-23):5282-5290. doi: 10.1093/bioinformatics/btaa1015.

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.GateKeeper：一种用于加速 DNA 短读映射预对齐的新硬件架构。

Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.

A hybrid short read mapping accelerator.一种混合短读映射加速器。

BMC Bioinformatics. 2013 Feb 26;14:67. doi: 10.1186/1471-2105-14-67.

GASSST: global alignment short sequence search tool.GASSST：全局比对短序列搜索工具。

Bioinformatics. 2010 Oct 15;26(20):2534-40. doi: 10.1093/bioinformatics/btq485. Epub 2010 Aug 24.

WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU：基于 GPU 的缺口仿射两两序列比对

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.

Arioc: GPU-accelerated alignment of short bisulfite-treated reads.Arioc：用于短亚硫酸氢盐处理读取物的 GPU 加速对齐。

Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167.

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.CMSA：一种用于多个相似RNA/DNA序列比对的异构CPU/GPU计算系统。

BMC Bioinformatics. 2017 Jun 24;18(1):315. doi: 10.1186/s12859-017-1725-6.

An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models.基于 FPGA 的硬件加速器，支持使用隐马尔可夫模型进行敏感序列同源性过滤。

BMC Bioinformatics. 2024 Jul 29;25(1):247. doi: 10.1186/s12859-024-05879-3.

LAMSA: fast split read alignment with long approximate matches.LAMSA：快速分裂读取比对算法，具有长近似匹配功能。

Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.

引用本文的文献

A survey of sequence-to-graph mapping algorithms in the pangenome era.泛基因组时代序列到图谱映射算法综述。

Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.

FPGA-based accelerator for adaptive banded event alignment in nanopore sequencing data analysis.用于纳米孔测序数据分析中自适应带状事件对齐的基于现场可编程门阵列的加速器

BMC Bioinformatics. 2025 Mar 17;26(1):83. doi: 10.1186/s12859-024-06011-1.

QuickEd: high-performance exact sequence alignment based on bound-and-align.QuickEd：基于绑定与比对的高性能精确序列比对

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf112.

Taming large-scale genomic analyses via sparsified genomics.通过稀疏化基因组学实现大规模基因组分析的优化

Nat Commun. 2025 Jan 21;16(1):876. doi: 10.1038/s41467-024-55762-1.

WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU：基于 GPU 的缺口仿射两两序列比对

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.

A framework for high-throughput sequence alignment using real processing-in-memory systems.基于真实处理内存储系统的高通量序列比对框架。

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad155.

Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.Scrooge：一种用于 CPU、GPU 和 ASIC 的快速且节省内存的基因组序列比对器。

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad151.

From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.从分子到基因组变异：通过智能算法和架构加速基因组分析

Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.

Proposal of Smith-Waterman algorithm on FPGA to accelerate the forward and backtracking steps.基于 FPGA 的 Smith-Waterman 算法加速前向和回溯步骤的提案。

PLoS One. 2022 Jun 30;17(6):e0254736. doi: 10.1371/journal.pone.0254736. eCollection 2022.

Technology dictates algorithms: recent developments in read alignment.技术决定算法：读段比对的最新进展。

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

本文引用的文献

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.GRIM-Filter：使用内存处理技术在 DNA 读取映射中快速进行种子位置过滤。

BMC Genomics. 2018 May 9;19(Suppl 2):89. doi: 10.1186/s12864-018-4460-0.

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.纳米孔测序技术和基因组组装工具：当前状态、瓶颈和未来方向的计算分析。

Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017.

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.GateKeeper：一种用于加速 DNA 短读映射预对齐的新硬件架构。

Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.

Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.Edlib：一个使用编辑距离进行快速、精确序列比对的C/C++库。

Bioinformatics. 2017 May 1;33(9):1394-1395. doi: 10.1093/bioinformatics/btw753.

FPGASW: Accelerating Large-Scale Smith-Waterman Sequence Alignment Application with Backtracking on FPGA Linear Systolic Array.FPGA 软核：在 FPGA 线性脉动阵列上回溯实现大规模 Smith-Waterman 序列比对应用的加速。

Interdiscip Sci. 2018 Mar;10(1):176-188. doi: 10.1007/s12539-017-0225-8. Epub 2017 Apr 21.

HiLive: real-time mapping of illumina reads while sequencing.HiLive：测序时对Illumina reads进行实时映射

Bioinformatics. 2017 Mar 15;33(6):917-319. doi: 10.1093/bioinformatics/btw659.

A Survey of Software and Hardware Approaches to Performing Read Alignment in Next Generation Sequencing.下一代测序中进行读段比对的软件和硬件方法综述。

IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1202-1213. doi: 10.1109/TCBB.2016.2586070. Epub 2016 Jun 29.

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail：用于全局、半全局和局部成对序列比对的SIMD C库。

BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.

Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.利用基于现场可编程门阵列（FPGA）的系统加速下一代长读长映射

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):840-52. doi: 10.1109/TCBB.2014.2326876.

Accuracy of Next Generation Sequencing Platforms.新一代测序平台的准确性。

Next Gener Seq Appl. 2014;1. doi: 10.4172/jngsa.1000106.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验