Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea.
Genome and Health Big Data Laboratory, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea.
Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.
In biological sequence alignment, prevailing heuristic aligners achieve high-throughput by several approximation techniques, but at the cost of sacrificing the clarity of output criteria and creating complex parameter spaces. To surmount these challenges, we introduce 'SigAlign', a novel alignment algorithm that employs two explicit cutoffs for the results: minimum length and maximum penalty per length, alongside three affine gap penalties. Comparative analyses of SigAlign against leading database search tools (BLASTn, MMseqs2) and read mappers (BWA-MEM, bowtie2, HISAT2, minimap2) highlight its performance in read mapping and database searches. Our research demonstrates that SigAlign not only provides high sensitivity with a non-heuristic approach, but also surpasses the throughput of existing heuristic aligners, particularly for high-accuracy reads or genomes with few repetitive regions. As an open-source library, SigAlign is poised to become a foundational component to provide a transparent and customizable alignment process to new analytical algorithms, tools and pipelines in bioinformatics.
在生物序列比对中,流行的启发式比对器通过多种近似技术实现了高通量,但代价是牺牲了输出标准的清晰度,并创建了复杂的参数空间。为了克服这些挑战,我们引入了“SigAlign”,这是一种新颖的比对算法,它为结果使用了两个显式截止值:最小长度和每个长度的最大罚分,以及三个仿射间隙罚分。SigAlign 与领先的数据库搜索工具(BLASTn、MMseqs2)和读映射器(BWA-MEM、bowtie2、HISAT2、minimap2)的比较分析突出了它在读取映射和数据库搜索中的性能。我们的研究表明,SigAlign 不仅提供了非启发式方法的高灵敏度,而且还超过了现有启发式比对器的吞吐量,特别是对于高精度读取或重复区域较少的基因组。作为一个开源库,SigAlign 有望成为新的分析算法、工具和生物信息学管道的基础组件,提供透明和可定制的比对过程。