Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, Tennessee, United States, assigned to Centers for Disease Control and Prevention, Atlanta, GA, USA.
Eagle Global Scientific LLC, contracting agency to the Centers for Disease Control and Prevention, Atlanta, GA, USA.
BMC Res Notes. 2024 Oct 14;17(1):308. doi: 10.1186/s13104-024-06951-0.
Trimming adapters and low-quality bases from next-generation sequencing (NGS) data is crucial for optimal analysis. We evaluated six trimming programs, implementing five different algorithms, for their effectiveness in trimming adapters and improving quality, contig assembly, and single-nucleotide polymorphism (SNP) quality and concordance for poliovirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and norovirus paired data sequenced on Illumina iSeq and MiSeq platforms. Trimmomatic and BBDuk effectively removed adapters from all datasets, unlike FastP, AdapterRemoval, SeqPurge, and Skewer. All trimmers improved read quality (Q ≥ 30, 87.8 - 96.1%) compared to raw reads (83.6 - 93.2%). Trimmers implementing traditional sequence-matching (Trimmomatic and AdapterRemoval) and overlapping algorithm (FastP) retained the highest-quality reads. While all trimmers improved the maximum contig length and genome coverage for iSeq and MiSeq viral assemblies, BBDuk-trimmed reads assembled the shortest contigs. SNP concordance was consistently high (> 97.7 - 100%) across trimmers. However, BBDuk-trimmed reads had the lowest quality SNPs. Overall, the two adapter trimmers that utilized the traditional sequence-matching algorithm performed consistently across the viral datasets analyzed. Our findings guide software selection and inform future versatile trimmer development for viral genome analysis.
从下一代测序 (NGS) 数据中修剪接头和低质量碱基对于优化分析至关重要。我们评估了六种修剪程序,它们实施了五种不同的算法,以评估其从所有数据集有效修剪接头和提高质量、拼接质量以及单核苷酸多态性 (SNP) 质量和一致性的能力,这些数据集为在 Illumina iSeq 和 MiSeq 平台上测序的脊髓灰质炎病毒、严重急性呼吸综合征冠状病毒 2 (SARS-CoV-2) 和诺如病毒配对数据。与 FastP、AdapterRemoval、SeqPurge 和 Skewer 不同,Trimmomatic 和 BBDuk 有效地从所有数据集去除了接头。与原始读取相比(83.6% 至 93.2%),所有修剪器都提高了读取质量(Q≥30,87.8%至 96.1%)。实施传统序列匹配(Trimmomatic 和 AdapterRemoval)和重叠算法(FastP)的修剪器保留了最高质量的读取。虽然所有修剪器都提高了 iSeq 和 MiSeq 病毒组装的最大拼接体长度和基因组覆盖率,但 BBDuk 修剪的读取组装了最短的拼接体。SNP 一致性在所有修剪器中始终很高(>97.7%至 100%)。然而,BBDuk 修剪的读取具有最低质量的 SNP。总体而言,两种使用传统序列匹配算法的接头修剪器在分析的病毒数据集上表现一致。我们的研究结果指导软件选择,并为未来用于病毒基因组分析的多功能修剪器开发提供信息。