• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用fastp进行超快速单通道FASTQ数据预处理、质量控制和重复数据删除。

Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp.

作者信息

Chen Shifu

机构信息

HaploX Biotechnology Shenzhen China.

Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China.

出版信息

Imeta. 2023 May 8;2(2):e107. doi: 10.1002/imt2.107. eCollection 2023 May.

DOI:10.1002/imt2.107
PMID:38868435
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10989850/
Abstract

A large amount of sequencing data is generated and processed every day with the continuous evolution of sequencing technology and the expansion of sequencing applications. One consequence of such sequencing data explosion is the increasing cost and complexity of data processing. The preprocessing of FASTQ data, which means removing adapter contamination, filtering low-quality reads, and correcting wrongly represented bases, is an indispensable but resource intensive part of sequencing data analysis. Therefore, although a lot of software applications have been developed to solve this problem, bioinformatics scientists and engineers are still pursuing faster, simpler, and more energy-efficient software. Several years ago, the author developed fastp, which is an ultrafast all-in-one FASTQ data preprocessor with many modern features. This software has been approved by many bioinformatics users and has been continuously maintained and updated. Since the first publication on fastp, it has been greatly improved, making it even faster and more powerful. For instance, the duplication evaluation module has been improved, and a new deduplication module has been added. This study aimed to introduce the new features of fastp and demonstrate how it was designed and implemented.

摘要

随着测序技术的不断发展和测序应用的扩展,每天都会生成和处理大量的测序数据。这种测序数据爆炸的一个后果是数据处理成本和复杂性的增加。FASTQ数据的预处理,即去除接头污染、过滤低质量 reads 以及校正错误表示的碱基,是测序数据分析中不可或缺但资源密集的一部分。因此,尽管已经开发了许多软件应用程序来解决这个问题,但生物信息学科学家和工程师仍在追求更快、更简单、更节能的软件。几年前,作者开发了fastp,这是一个具有许多现代功能的超快一体化FASTQ数据预处理器。该软件已得到许多生物信息学用户的认可,并一直在持续维护和更新。自首次发表关于fastp的文章以来,它已经有了很大的改进,使其更快、更强大。例如,重复评估模块得到了改进,并添加了一个新的去重模块。本研究旨在介绍fastp的新功能,并展示其设计和实现方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/2e6f759adbad/IMT2-2-e107-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/aa17876920f9/IMT2-2-e107-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/80d3ccc3a750/IMT2-2-e107-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/2e6f759adbad/IMT2-2-e107-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/aa17876920f9/IMT2-2-e107-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/80d3ccc3a750/IMT2-2-e107-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b46f/10989850/2e6f759adbad/IMT2-2-e107-g003.jpg

相似文献

1
Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp.使用fastp进行超快速单通道FASTQ数据预处理、质量控制和重复数据删除。
Imeta. 2023 May 8;2(2):e107. doi: 10.1002/imt2.107. eCollection 2023 May.
2
fastp: an ultra-fast all-in-one FASTQ preprocessor.fastp:一个超快速的一体化 FASTQ 预处理程序。
Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.
3
fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data.fastQ_brew:用于FASTQ序列数据的分析、预处理和重新格式化的模块。
BMC Res Notes. 2017 Jul 12;10(1):275. doi: 10.1186/s13104-017-2616-7.
4
FastqPuri: high-performance preprocessing of RNA-seq data.FastqPuri:RNA-seq 数据的高性能预处理。
BMC Bioinformatics. 2019 May 3;20(1):226. doi: 10.1186/s12859-019-2799-0.
5
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.QC之后:对fastq数据进行自动过滤、修剪、错误去除和质量控制。
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):80. doi: 10.1186/s12859-017-1469-3.
6
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore:一种高效的工具,用于生成共识读数,以抑制 NGS 数据的错误并去除重复。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.
7
RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.RabbitFX:适用于现代多核平台的 FASTA/Q 文件解析的高效框架。
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2341-2348. doi: 10.1109/TCBB.2022.3219114. Epub 2023 Jun 5.
8
VGEA: an RNA viral assembly toolkit.VGEA:一种RNA病毒组装工具包。
PeerJ. 2021 Sep 6;9:e12129. doi: 10.7717/peerj.12129. eCollection 2021.
9
UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction.UMIc:一种用于UMI去重和读段校正的预处理方法。
Front Genet. 2021 May 28;12:660366. doi: 10.3389/fgene.2021.660366. eCollection 2021.
10
FastqCleaner: an interactive Bioconductor application for quality-control, filtering and trimming of FASTQ files.FastqCleaner:一个交互式 Bioconductor 应用程序,用于 FASTQ 文件的质量控制、过滤和修剪。
BMC Bioinformatics. 2019 Jun 28;20(1):361. doi: 10.1186/s12859-019-2961-8.

引用本文的文献

1
Survival strategies for the microbiome in a vent-dwelling glass sponge from the middle Okinawa Trough.冲绳海槽中部一种栖息于热液喷口的玻璃海绵中微生物群落的生存策略
Front Microbiol. 2025 Aug 29;16:1636046. doi: 10.3389/fmicb.2025.1636046. eCollection 2025.
2
Determination of the Mechanisms of MCPA Resistance in .测定……中MCPA抗性的机制
Plant Direct. 2025 Sep 11;9(9):e70105. doi: 10.1002/pld3.70105. eCollection 2025 Sep.
3
Coastal methane emissions driven by aerotolerant methanogens using seaweed and seagrass metabolites.

本文引用的文献

1
fastp: an ultra-fast all-in-one FASTQ preprocessor.fastp:一个超快速的一体化 FASTQ 预处理程序。
Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.
2
TNER: a novel background error suppression method for mutation detection in circulating tumor DNA.TNER:一种用于循环肿瘤 DNA 突变检测的新型背景误差抑制方法。
BMC Bioinformatics. 2018 Oct 20;19(1):387. doi: 10.1186/s12859-018-2428-3.
3
Bioconda: sustainable and comprehensive software distribution for the life sciences.生物conda:面向生命科学的可持续且全面的软件发行平台。
耐氧产甲烷菌利用海藻和海草代谢产物驱动的沿海甲烷排放。
Nat Geosci. 2025;18(9):854-861. doi: 10.1038/s41561-025-01768-3. Epub 2025 Aug 7.
4
Probiotic and Vitamin D Ameliorate TNBS-Induced Colitis by Targeting Mucosal Barrier and Neutrophil Infiltration.益生菌和维生素D通过靶向黏膜屏障和中性粒细胞浸润改善三硝基苯磺酸诱导的结肠炎。
Nutrients. 2025 Aug 22;17(17):2719. doi: 10.3390/nu17172719.
5
Everything, everywhere, all at once - Surveillance and molecular epidemiology reveal Melissococcus plutonius is endemic among Michigan, US beekeeping operations.瞬息全宇宙——监测与分子流行病学研究表明,美国密歇根州的养蜂场中普遍存在蜂房蜜蜂球菌。
PLoS One. 2025 Sep 12;20(9):e0331903. doi: 10.1371/journal.pone.0331903. eCollection 2025.
6
An Intergeneric Hybrid Between Historically Isolated Temperate and Tropical Jays Following Recent Range Expansion.近期分布范围扩张后,历史上隔离的温带和热带松鸦之间的属间杂交种。
Ecol Evol. 2025 Sep 10;15(9):e72148. doi: 10.1002/ece3.72148. eCollection 2025 Sep.
7
Comparative Transcriptomic Analysis Reveals Key Growth-Related Genes and Alternative Splicing Events in Siniperca scherzeri.比较转录组分析揭示了斑鳜关键生长相关基因和可变剪接事件。
Mar Biotechnol (NY). 2025 Sep 11;27(5):137. doi: 10.1007/s10126-025-10516-y.
8
Chromosome-level assembly of the 400-year-old Goethe's Palm (Chamaerops humilis L.).400岁歌德棕(矮棕)的染色体水平组装
Sci Data. 2025 Sep 10;12(1):1542. doi: 10.1038/s41597-025-05673-7.
9
EZH2 variants derived from cryptic splice sites govern distinct epigenetic patterns during embryonic development.源自隐秘剪接位点的EZH2变体在胚胎发育过程中调控不同的表观遗传模式。
Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf869.
10
Next-generation sequencing applications in food science: fundamentals and recent advances.下一代测序技术在食品科学中的应用:基础与最新进展
Front Bioeng Biotechnol. 2025 Aug 20;13:1638957. doi: 10.3389/fbioe.2025.1638957. eCollection 2025.
Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7.
4
FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool.FQC仪表板:将FastQC结果集成到一个基于网络的、交互式的且可扩展的FASTQ质量控制工具中。
Bioinformatics. 2017 Oct 1;33(19):3137-3139. doi: 10.1093/bioinformatics/btx373. Epub 2017 Jun 9.
5
Trimmomatic: a flexible trimmer for Illumina sequence data.Trimmomatic:一款适用于 Illumina 测序数据的灵活修剪工具。
Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1.
6
Assuring the quality of next-generation sequencing in clinical laboratory practice.确保临床实验室实践中下一代测序的质量。
Nat Biotechnol. 2012 Nov;30(11):1033-6. doi: 10.1038/nbt.2403.
7
NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.NGS QC 工具包:下一代测序数据质量控制工具包。
PLoS One. 2012;7(2):e30619. doi: 10.1371/journal.pone.0030619. Epub 2012 Feb 1.