Suppr超能文献

SeqPurge:用于双端NGS数据的高灵敏度接头修剪

SeqPurge: highly-sensitive adapter trimming for paired-end NGS data.

作者信息

Sturm Marc, Schroeder Christopher, Bauer Peter

机构信息

Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Tübingen, Germany.

出版信息

BMC Bioinformatics. 2016 May 10;17:208. doi: 10.1186/s12859-016-1069-7.

Abstract

BACKGROUND

Trimming of adapter sequences from short read data is a common preprocessing step during NGS data analysis. When performing paired-end sequencing, the overlap between forward and reverse read can be used to identify excess adapter sequences. This is exploited by several previously published adapter trimming tools. However, our evaluation on amplicon-based data shows that most of the current tools are not able to remove all adapter sequences and that adapter contamination may even lead to spurious variant calls.

RESULTS

Here we present SeqPurge ( https://github.com/imgag/ngs-bits ), a highly-sensitive adapter trimmer that uses a probabilistic approach to detect the overlap between forward and reverse reads of Illumina sequencing data. SeqPurge can detect very short adapter sequences, even if only one base long. Compared to other adapter trimmers specifically designed for paired-end data, we found that SeqPurge achieves a higher sensitivity. The number of remaining adapter bases after trimming is reduced by up to 90 %, depending on the compared tool. In simulations with different error rates, we found that SeqPurge is also the most error-tolerant adapter trimmer in the comparison.

CONCLUSION

SeqPurge achieves a very high sensitivity and a high error-tolerance, combined with a specificity and runtime that are comparable to other state-of-the-art adapter trimmers. The very good adapter trimming performance, complemented with additional features such as quality-based trimming and basic quality control, makes SeqPurge an excellent choice for the pre-processing of paired-end NGS data.

摘要

背景

从短读长数据中去除接头序列是二代测序数据分析中常见的预处理步骤。在进行双端测序时,正向和反向读段之间的重叠部分可用于识别多余的接头序列。一些先前发表的接头去除工具利用了这一点。然而,我们基于扩增子数据的评估表明,当前大多数工具无法去除所有接头序列,而且接头污染甚至可能导致错误的变异调用。

结果

在此,我们展示了SeqPurge(https://github.com/imgag/ngs-bits),这是一种高度灵敏的接头去除工具,它使用概率方法来检测Illumina测序数据中正向和反向读段之间的重叠。SeqPurge能够检测到非常短的接头序列,即使只有一个碱基长。与其他专门为双端数据设计的接头去除工具相比,我们发现SeqPurge具有更高的灵敏度。根据所比较的工具不同,修剪后剩余接头碱基的数量最多可减少90%。在不同错误率的模拟中,我们发现SeqPurge也是比较中最容错的接头去除工具。

结论

SeqPurge具有非常高的灵敏度和高容错性,同时具有与其他最先进的接头去除工具相当的特异性和运行时间。其出色的接头去除性能,辅以基于质量的修剪和基本质量控制等附加功能,使SeqPurge成为双端二代测序数据预处理的绝佳选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b95/4862148/9ee5ef7e2b73/12859_2016_1069_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验