Suppr超能文献

在预处理环境宏基因组数据之前合并配对末端reads 的好处。

Benefits of merging paired-end reads before pre-processing environmental metagenomics data.

机构信息

Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, India.

Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, India.

出版信息

Mar Genomics. 2022 Feb;61:100914. doi: 10.1016/j.margen.2021.100914. Epub 2021 Dec 2.

Abstract

BACKGROUND

High throughput sequencing of environmental DNA has applications in biodiversity monitoring, taxa abundance estimation, understanding the dynamics of community ecology, and marine species studies and conservation. Environmental DNA, especially, marine eDNA, has a fast degradation rate. Aside from the good quality reads, the data could have a significant number of reads that fall slightly below the default PHRED quality threshold of 30 on sequencing. For quality control, trimming methods are employed, which generally precede the merging of the read pairs. However, in the case of eDNA, a significant percentage of reads within the acceptable quality score range are also dropped.

METHODS

To infer the ideal merge tool that is sensitive to eDNA, two Hiseq paired-end eDNA datasets were utilized to study the merging by the tools - FLASH (Fast Length Adjustment of SHort reads), PANDAseq, COPE, BBMerge, and VSEARCH without preprocessing. We assessed these tools on the following parameters: Time taken to process, the quality, and the number of merged reads. Trimmomatic, a widely-used preprocessing tool, was also assessed by preprocessing the datasets at different parameters for the two approaches of preprocessing: Sliding Window and Maximum Information. The preprocessed read pairs were then merged using the ideal merge tool identified earlier.

RESULTS

FLASH is the most efficient merge tool balancing data conservation, quality of reads, and processing time. We compared Trimmomatic's two quality trimming options with increasing strictness with FLASH's direct merge. The raw reads processed with Trimmomatic then merged, yielded a significant drop in reads compared to the direct merge. An average of 29% of reads was dropped when directly merged with FLASH. Maximum Information option resulted in 30.7% to 68.05% read loss with lowest and highest stringency parameters, respectively. The Sliding Window approach conserves approximately 10% more reads at a PHRED score of 25 set as the threshold for a window of size 4. The lowered PHRED cut off conserves about 50% of the reads that could potentially be informative. We noted no significant reduction of data while optimizing the number of reads read in a window with the ideal quality (Q) score.

CONCLUSIONS

Losing reads can negatively impact the downstream processing of the environmental data, especially for sequence alignment studies. The quality trim-first-merge-later approach can significantly decrease the number of reads conserved. However, direct merging of pair-end reads using FLASH conserved more than 60% of the reads. Therefore, direct merging of the paired-end reads can prevent potential removal of informative reads that do not comply by the trimming tool's strict checks. FLASH to be an efficient tool in conserving reads while carrying out quality trimming in moderation. Overall, our results show that merging paired-end reads of eDNA data before trimming can conserve more reads.

摘要

背景

高通量测序的环境 DNA 可应用于生物多样性监测、分类群丰度估计、了解群落生态学动态以及海洋物种研究和保护。环境 DNA,特别是海洋 eDNA,具有快速降解率。除了高质量的读数外,数据中可能还有大量读数的 PHRED 质量阈值略低于默认的 30。为了进行质量控制,通常会在合并读对之前使用修剪方法。然而,在 eDNA 的情况下,也会丢弃大量处于可接受质量分数范围内的读数。

方法

为了推断对 eDNA 敏感的理想合并工具,我们利用两个 Hiseq 配对末端 eDNA 数据集来研究以下工具的合并:FLASH(快速调整短读的长度)、PANDAseq、COPE、BBMerge 和 VSEARCH,而无需预处理。我们评估了这些工具的以下参数:处理时间、质量和合并的读数量。还评估了广泛使用的预处理工具 Trimmomatic,针对两种预处理方法(滑动窗口和最大信息)对数据集进行了不同参数的预处理。然后使用前面确定的理想合并工具合并预处理后的读对。

结果

FLASH 是一种最有效的合并工具,平衡了数据保留、读质量和处理时间。我们将 Trimmomatic 的两种质量修剪选项与 FLASH 的直接合并进行了比较,随着严格性的增加,直接合并的读数量明显减少。与直接与 FLASH 合并相比,用 Trimmomatic 处理的原始读数的读数量显著下降。最低和最高严格度参数下的最大信息选项分别导致 30.7%至 68.05%的读损失。滑动窗口方法在设置为 4 大小窗口的 PHRED 分数 25 时保留约 10%更多的读。降低 PHRED 截止值可保留约 50%的潜在信息读。在优化具有理想质量(Q)分数的窗口中的读数量时,我们没有注意到数据有任何显著减少。

结论

丢失读数会对环境数据的下游处理产生负面影响,特别是对于序列比对研究。质量修剪优先-合并后处理方法可以显著减少保留的读数量。然而,使用 FLASH 直接合并配对末端读可以保留超过 60%的读。因此,直接合并配对末端读可以防止不符合修剪工具严格检查的信息读被删除。FLASH 是一种在适度进行质量修剪的同时保留读的有效工具。总体而言,我们的结果表明,在修剪之前合并 eDNA 数据的配对末端读可以保留更多的读。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验