Gaston Jeffry M, Alm Eric J, Zhang An-Ni
Google, Cambridge, MA, USA.
School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
Genome Biol. 2025 Jan 22;26(1):15. doi: 10.1186/s13059-024-03473-7.
Sequence alignment is foundational to many bioinformatic analyses. Many aligners start by splitting sequences into contiguous, fixed-length seeds, called k-mers. Alignment is faster with longer, unique seeds, but more accurate with shorter seeds avoiding mutations. Here, we introduce X-Mapper, aiming to offer high speed and accuracy via dynamic-length seeds containing gaps, called gapped x-mers. We observe 11-24-fold fewer suboptimal alignments analyzing a human reference and 3-579-fold lower inconsistency across bacterial references than other aligners, improving on 53% and 30% of reads aligned to non-target strains and species, respectively. Other seed-based analysis algorithms might benefit from gapped x-mers too.
序列比对是许多生物信息学分析的基础。许多比对工具首先将序列拆分成连续的、固定长度的种子,称为k-mer。使用更长、唯一的种子进行比对速度更快,但使用更短的种子避免突变时比对更准确。在这里,我们引入了X-Mapper,旨在通过包含间隙的动态长度种子(称为带间隙的x-mer)提供高速和准确性。与其他比对工具相比,我们观察到在分析人类参考序列时次优比对减少了11至24倍,在细菌参考序列中不一致性降低了3至579倍,分别改善了53%和30%比对到非目标菌株和物种的 reads。其他基于种子的分析算法也可能从带间隙的x-mer中受益。