Suppr超能文献

FastqPuri:RNA-seq 数据的高性能预处理。

FastqPuri: high-performance preprocessing of RNA-seq data.

机构信息

Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Am BioPark 9, Regensburg, 93053, Germany.

Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research and Utrecht University, P.O. Box 59, Den Burg, 1790 AB, The Netherlands.

出版信息

BMC Bioinformatics. 2019 May 3;20(1):226. doi: 10.1186/s12859-019-2799-0.

Abstract

BACKGROUND

RNA sequencing (RNA-seq) has become the standard means of analyzing gene and transcript expression in high-throughput. While previously sequence alignment was a time demanding step, fast alignment methods and even more so transcript counting methods which avoid mapping and quantify gene and transcript expression by evaluating whether a read is compatible with a transcript, have led to significant speed-ups in data analysis. Now, the most time demanding step in the analysis of RNA-seq data is preprocessing the raw sequence data, such as running quality control and adapter, contamination and quality filtering before transcript or gene quantification. To do so, many researchers chain different tools, but a comprehensive, flexible and fast software that covers all preprocessing steps is currently missing.

RESULTS

We here present FastqPuri, a light-weight and highly efficient preprocessing tool for fastq data. FastqPuri provides sequence quality reports on the sample and dataset level with new plots which facilitate decision making for subsequent quality filtering. Moreover, FastqPuri efficiently removes adapter sequences and sequences from biological contamination from the data. It accepts both single- and paired-end data in uncompressed or compressed fastq files. FastqPuri can be run stand-alone and is suitable to be run within pipelines. We benchmarked FastqPuri against existing tools and found that FastqPuri is superior in terms of speed, memory usage, versatility and comprehensiveness.

CONCLUSIONS

FastqPuri is a new tool which covers all aspects of short read sequence data preprocessing. It was designed for RNA-seq data to meet the needs for fast preprocessing of fastq data to allow transcript and gene counting, but it is suitable to process any short read sequencing data of which high sequence quality is needed, such as for genome assembly or SNV (single nucleotide variant) detection. FastqPuri is most flexible in filtering undesired biological sequences by offering two approaches to optimize speed and memory usage dependent on the total size of the potential contaminating sequences. FastqPuri is available at https://github.com/jengelmann/FastqPuri . It is implemented in C and R and licensed under GPL v3.

摘要

背景

RNA 测序(RNA-seq)已成为分析高通量基因和转录本表达的标准手段。虽然以前序列比对是一个耗时的步骤,但快速的比对方法,甚至更快速的转录本计数方法,通过评估读取是否与转录本兼容来避免映射和量化基因和转录本表达,已经显著加快了数据分析的速度。现在,RNA-seq 数据分析中最耗时的步骤是预处理原始序列数据,例如在进行转录本或基因定量之前,运行质量控制和适配器、污染和质量过滤。为此,许多研究人员会链式使用不同的工具,但目前缺少一种全面、灵活且快速的软件,涵盖所有预处理步骤。

结果

我们在这里介绍了 FastqPuri,这是一种用于快速测序数据的轻量级且高效的预处理工具。FastqPuri 提供了样本和数据集级别的序列质量报告,并提供了新的图表,方便了对后续质量过滤的决策。此外,FastqPuri 可以有效地从数据中去除适配器序列和生物污染序列。它接受未压缩或压缩的 fastq 文件中的单端和双端数据。FastqPuri 可以独立运行,也适合在管道中运行。我们对 FastqPuri 与现有工具进行了基准测试,发现 FastqPuri 在速度、内存使用、多功能性和全面性方面都具有优势。

结论

FastqPuri 是一种新的工具,涵盖了短读序列数据预处理的各个方面。它是为 RNA-seq 数据设计的,旨在满足快速预处理 fastq 数据以允许转录本和基因计数的需求,但它也适合处理任何需要高质量序列的短读测序数据,例如基因组组装或单核苷酸变异(SNV)检测。FastqPuri 提供了两种方法来优化速度和内存使用,具体取决于潜在污染序列的总大小,从而在过滤不需要的生物序列方面具有最大的灵活性。FastqPuri 可在 https://github.com/jengelmann/FastqPuri 上获得。它是用 C 和 R 实现的,并根据 GPL v3 许可。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddf1/6500068/c2e83938c198/12859_2019_2799_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验