Laboratory of Machine Learning and Intelligent Instrumentation, nPITI/IMD, Federal University of Rio Grande do Norte, Natal, Brazil.
Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil.
PLoS One. 2022 Jun 30;17(6):e0254736. doi: 10.1371/journal.pone.0254736. eCollection 2022.
In bioinformatics, alignment is an essential technique for finding similarities between biological sequences. Usually, the alignment is performed with the Smith-Waterman (SW) algorithm, a well-known sequence alignment technique of high-level precision based on dynamic programming. However, given the massive data volume in biological databases and their continuous exponential increase, high-speed data processing is necessary. Therefore, this work proposes a parallel hardware design for the SW algorithm with a systolic array structure to accelerate the forward and backtracking steps. For this purpose, the architecture calculates and stores the paths in the forward stage for pre-organizing the alignment, which reduces the complexity of the backtracking stage. The backtracking starts from the maximum score position in the matrix and generates the optimal SW sequence alignment path. The architecture was validated on Field-Programmable Gate Array (FPGA), and synthesis analyses have shown that the proposed design reaches up to 79.5 Giga Cell Updates per Second (GCPUS).
在生物信息学中,比对是发现生物序列之间相似性的一项基本技术。通常,比对是通过 Smith-Waterman(SW)算法完成的,这是一种基于动态规划的高精度的著名序列比对技术。然而,考虑到生物数据库中的海量数据量及其持续的指数级增长,高速数据处理是必要的。因此,这项工作提出了一种基于脉动阵列结构的 SW 算法的并行硬件设计,以加速前向和回溯步骤。为此,该架构在向前阶段计算和存储路径,以便于对齐,从而降低回溯阶段的复杂度。回溯从矩阵中的最大得分位置开始,并生成最优的 SW 序列比对路径。该架构已在现场可编程门阵列(FPGA)上进行了验证,综合分析表明,所提出的设计达到了每秒 79.5 吉细胞更新(GCPUS)的速度。