Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.
Bioinformatics. 2021 Nov 18;37(22):4243-4245. doi: 10.1093/bioinformatics/btab396.
As more population genetics datasets and population-specific references become available, the task of translating ('lifting') read alignments from one reference coordinate system to another is becoming more common. Existing tools generally require a chain file, whereas VCF files are the more common way to represent variation. Existing tools also do not make effective use of threads, creating a post-alignment bottleneck.
LevioSAM is a tool for lifting SAM/BAM alignments from one reference to another using a VCF file containing population variants. LevioSAM uses succinct data structures and scales efficiently to many threads. When run downstream of a read aligner, levioSAM is more than 7 times faster than an aligner when both are run with 16 threads.
Software Package: https://github.com/alshai/levioSAM, Experiments: https://github.com/langmead-lab/levioSAM-experiments.
Supplementary data are available at Bioinformatics online.
随着越来越多的群体遗传学数据集和特定于群体的参考资料的出现,将读取比对从一个参考坐标系转换到另一个参考坐标系的任务变得越来越普遍。现有的工具通常需要一个链文件,而 VCF 文件则是表示变异的更常见的方式。现有的工具也没有有效地利用线程,从而在对齐后形成瓶颈。
LevioSAM 是一种使用包含群体变异的 VCF 文件将 SAM/BAM 比对从一个参考系转换到另一个参考系的工具。LevioSAM 使用简洁的数据结构,并能有效地扩展到多个线程。在下游的读取对齐器运行时,当两者都使用 16 个线程运行时,LevioSAM 比对齐器快 7 倍以上。
软件包:https://github.com/alshai/levioSAM,实验:https://github.com/langmead-lab/levioSAM-experiments。
补充数据可在 Bioinformatics 在线获取。