Department of Bioinformatics, HaploX Biotechnology, Shenzhen, China.
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.
Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient.
We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.
The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
快速质量控制和预处理 FASTQ 文件对于为下游分析提供干净的数据至关重要。传统上,每个操作(如质量控制、接头修剪和质量过滤)都使用不同的工具。这些工具通常不够快,因为大多数都是使用高级编程语言(如 Python 和 Java)开发的,并且提供的多线程支持有限。多次读取和加载数据也使得预处理速度慢,I/O 效率低。
我们开发了 fastp,这是一个超快的 FASTQ 预处理程序,具有有用的质量控制和数据过滤功能。它可以在单个 FASTQ 数据扫描中执行质量控制、接头修剪、质量过滤、每读质量修剪和许多其他操作。该工具是用 C++开发的,支持多线程。根据我们的评估,fastp 比其他 FASTQ 预处理工具(如 Trimmomatic 或 Cutadapt)快 2-5 倍,尽管它执行的操作远远超过类似的工具。
开源代码和相应的说明可在 https://github.com/OpenGene/fastp 上获得。