Chuan Jiacheng, Zhou Aiguo, Hale Lawrence Richard, He Miao, Li Xiang
Canadian Food Inspection Agency, Charlottetown, PE C1A5T1, Canada.
University of Prince Edward Island, Department of Biology, Charlottetown, PE C1A4P3, Canada.
GigaByte. 2021 Oct 15;2021:gigabyte31. doi: 10.46471/gigabyte.31. eCollection 2021.
With advances in next-generation sequencing, adapters attached to reads and low-quality bases directly and implicitly hinder downstream analysis. For example, they can produce false-positive single nucleotide polymorphisms (SNP), and generate fragmented assemblies. There is a need for a fast trimming algorithm to remove adapters precisely, especially in read tails with relatively low quality. Here, we present Atria, a trimming program that matches the adapters in paired reads and finds possible overlapped regions using a fast and carefully designed byte-based matching algorithm ( () time with (1) space). Atria also implements multi-threading in both sequence processing and file compression and supports single-end reads. Compared with other trimmers, Atria performs favorably in various trimming and runtime benchmarks of both simulated and real data. We also provide a fast and lightweight byte-based matching algorithm, which can be used in various short-sequence matching applications, such as primer search and seed scanning before alignment.
随着下一代测序技术的发展,附着在 reads 上的接头和低质量碱基会直接且隐含地阻碍下游分析。例如,它们会产生假阳性单核苷酸多态性(SNP),并生成碎片化组装。因此,需要一种快速的修剪算法来精确去除接头,特别是在质量相对较低的 read 尾部。在此,我们展示了 Atria,这是一个修剪程序,它使用快速且精心设计的基于字节的匹配算法(时间复杂度为 () ,空间复杂度为 (1) )来匹配双端 reads 中的接头并找到可能的重叠区域。Atria 还在序列处理和文件压缩中实现了多线程,并支持单端 reads。与其他修剪器相比,Atria 在模拟数据和真实数据的各种修剪及运行时基准测试中表现出色。我们还提供了一种快速且轻量级的基于字节的匹配算法,该算法可用于各种短序列匹配应用,如引物搜索和比对前的种子扫描。