National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike Bethesda, MD 20894, USA.
Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa023.
Alignment of sequence reads generated by next-generation sequencing is an integral part of most pipelines analyzing next-generation sequencing data. A number of tools designed to quickly align a large volume of sequences are already available. However, most existing tools lack explicit guarantees about their output. They also do not support searching genome assemblies, such as the human genome assembly GRCh38, that include primary and alternate sequences and placement information for alternate sequences to primary sequences in the assembly.
This paper describes SRPRISM (Single Read Paired Read Indel Substitution Minimizer), an alignment tool for aligning reads without splices. SRPRISM has features not available in most tools, such as (i) support for searching genome assemblies with alternate sequences, (ii) partial alignment of reads with a specified region of reads to be included in the alignment, (iii) choice of ranking schemes for alignments, and (iv) explicit criteria for search sensitivity. We compare the performance of SRPRISM to GEM, Kart, STAR, BWA-MEM, Bowtie2, Hobbes, and Yara using benchmark sets for paired and single reads of lengths 100 and 250 bp generated using DWGSIM. SRPRISM found the best results for most benchmark sets with error rate of up to ∼2.5% and GEM performed best for higher error rates. SRPRISM was also more sensitive than other tools even when sensitivity was reduced to improve run time performance.
We present SRPRISM as a flexible read mapping tool that provides explicit guarantees on results.
下一代测序产生的序列读取对齐是分析下一代测序数据的大多数流程的一个组成部分。已经有许多旨在快速对齐大量序列的工具可用。然而,大多数现有工具缺乏其输出的明确保证。它们也不支持搜索基因组组装,例如人类基因组组装 GRCh38,其中包括主要序列和替代序列以及替代序列在组装中相对于主要序列的位置信息。
本文描述了 SRPRISM(单读配对读插入缺失替换最小化器),这是一种用于对齐无剪接的读的对齐工具。SRPRISM 具有大多数工具所没有的功能,例如(i)支持搜索具有替代序列的基因组组装,(ii)指定要包含在对齐中的读的部分对齐,(iii)对齐的排序方案选择,以及(iv)搜索灵敏度的明确标准。我们使用使用 DWGSIM 生成的 100bp 和 250bp 长度的配对和单读基准集,将 SRPRISM 与 GEM、Kart、STAR、BWA-MEM、Bowtie2、Hobbes 和 Yara 的性能进行了比较。对于大多数基准集,SRPRISM 都找到了最佳的结果,错误率高达约 2.5%,而 GEM 在更高的错误率下表现最佳。即使为了提高运行时性能而降低了灵敏度,SRPRISM 也比其他工具更敏感。
我们提出了 SRPRISM 作为一种灵活的读映射工具,它提供了结果的明确保证。