Suppr超能文献

reads 修剪对细菌 SNP 调用准确性的影响最小。

Read trimming has minimal effect on bacterial SNP-calling accuracy.

机构信息

Nuffield Department of Medicine, University of Oxford, Oxford, UK.

出版信息

Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.

Abstract

Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as 'trimming'. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from , and were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.

摘要

读段比对是许多执行变异调用的分析流程的核心步骤。为了减少错误,通常会预处理原始测序读段以去除低质量碱基和残留接头污染,这一过程统称为“修剪”。尽管对其效果进行了相对较少的系统评估,并且对其功效也没有明确的共识,但广泛认为修剪可以提高变异调用的准确性。随着测序数据集在数量和大小上的增加,重新评估那些益处不确定的计算操作是值得的,特别是当许多分析现在通常包含数千个样本时,这会增加所需的时间和成本。本研究使用经过精心挑选的 17 个革兰氏阴性细菌基因组,最初评估了四种读段修剪工具(Atropos、fastp、Trim Galore 和 Trimmomatic)的影响,每个工具都使用了一系列严格程度,对三种细菌 SNP 调用管道的准确性和完整性的影响。结果发现,即使使用本研究中性能最高的预处理工具 fastp,读段修剪也仅对 SNP 调用准确性产生了微小的、统计学上无显著意义的提高。为了扩展这些发现,使用一个通用的分析管道重新分析了来自 、 和 的 6500 多个公共存档测序数据集。在所分析的所有样本中,大约有 1.25 亿个 SNP 和 1250 万个 indel 被调用,相同的碱基在 98.8%和 91.9%的情况下被调用,而不论使用原始读段还是修剪后的读段。然而,在修剪后,混合调用(即支持变异等位基因的读段比例<100%;被认为是假阳性的代理)的比例显著降低,这表明虽然修剪很少改变变异碱基的集合,但它可以影响每个调用的读段支持比例。研究结论认为,SNP 调用管道的读段质量和接头修剪增加的价值相对较小,只有在 SNP 调用的绝对数量或假阳性率的微小差异至关重要的情况下才需要进行修剪。对 indel 调用管道进行修剪的效用也可以得出类似的结论。在进行 SNP 调用之前,通常会对读段进行质量和接头修剪,这可能是出于对不这样做可能会产生负面影响的担忧。虽然在历史上这可能是事实,但本研究中的数据表明,读段修剪并不总是实际的必要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed64/8116680/dd5e05a581d1/mgen-6-434-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验