Suppr超能文献

耳环法:一种高效且准确的衔接子修剪方法不需要预先知道衔接子序列。

EARRINGS: an efficient and accurate adapter trimmer entails no a priori adapter sequences.

作者信息

Wang Ting-Hsuan, Huang Cheng-Ching, Hung Jui-Hung

机构信息

Department of Computer Science, College of Computer Science, National Chiao Tung University, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan.

出版信息

Bioinformatics. 2021 Jul 27;37(13):1846-1852. doi: 10.1093/bioinformatics/btab025.

Abstract

MOTIVATION

Cross-sample comparisons or large-scale meta-analyses based on the next generation sequencing (NGS) involve replicable and universal data preprocessing, including removing adapter fragments in contaminated reads (i.e. adapter trimming). While modern adapter trimmers require users to provide candidate adapter sequences for each sample, which are sometimes unavailable or falsely documented in the repositories (such as GEO or SRA), large-scale meta-analyses are therefore jeopardized by suboptimal adapter trimming.

RESULTS

Here we introduce a set of fast and accurate adapter detection and trimming algorithms that entail no a priori adapter sequences. These algorithms were implemented in modern C++ with SIMD and multithreading to accelerate its speed. Our experiments and benchmarks show that the implementation (i.e. EARRINGS), without being given any hint of adapter sequences, can reach comparable accuracy and higher throughput than that of existing adapter trimmers. EARRINGS is particularly useful in meta-analyses of a large batch of datasets and can be incorporated in any sequence analysis pipelines in all scales.

AVAILABILITY AND IMPLEMENTATION

EARRINGS is open-source software and is available at https://github.com/jhhung/EARRINGS.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基于下一代测序(NGS)的跨样本比较或大规模荟萃分析涉及可重复且通用的数据预处理,包括去除污染读段中的接头片段(即接头修剪)。虽然现代接头修剪工具要求用户为每个样本提供候选接头序列,但这些序列有时不可用或在数据库(如GEO或SRA)中记录错误,因此大规模荟萃分析会因接头修剪不理想而受到影响。

结果

在此,我们介绍了一组快速且准确的接头检测和修剪算法,这些算法无需先验接头序列。这些算法用现代C++实现,并结合了SIMD和多线程技术以加速其运行速度。我们的实验和基准测试表明,该实现(即EARRINGS)在未给出任何接头序列提示的情况下,能够达到与现有接头修剪工具相当的准确性且具有更高的通量。EARRINGS在大量数据集的荟萃分析中特别有用,并且可以纳入任何规模的序列分析流程中。

可用性与实现

EARRINGS是开源软件,可从https://github.com/jhhung/EARRINGS获取。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验