Sanger 测序的 FASTQ 文件格式，用于包含质量分数的序列，以及 Solexa/Illumina FASTQ 变体。

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

机构信息

Plant Pathology, SCRI, Invergowrie, Dundee DD2 5DA, UK.

出版信息

Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.

DOI:10.1093/nar/gkp1137

PMID:20015970

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2847217/

Abstract

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

摘要

FASTQ 已成为一种通用的文件格式，用于共享测序读取数据，其中包含序列和每个碱基的相关质量评分，尽管迄今为止它还没有正式的定义，并且至少存在三种不兼容的变体。本文基于 MAQ 文档和最近由 Open Bioinformatics Foundation 项目 Biopython、BioPerl、BioRuby、BioJava 和 EMBOSS 共同商定的约定等公开信息，定义了 FASTQ 格式，涵盖了原始的桑格标准、Solexa/Illumina 变体以及它们之间的转换。作为一个开放获取的出版物，希望这个描述以及提供的示例文件作为补充数据，将来能成为这个重要文件格式的参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0992/2847217/c81bf3b7c984/gkp1137f1.jpg

相似文献

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.

ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research.

BMC Bioinformatics. 2016 Feb 2;17:56. doi: 10.1186/s12859-016-0915-y.

XS: a FASTQ read simulator.

BMC Res Notes. 2014 Jan 16;7:40. doi: 10.1186/1756-0500-7-40.

Sharing Programming Resources Between Bio* Projects.

Methods Mol Biol. 2019;1910:747-766. doi: 10.1007/978-1-4939-9074-0_25.

Generation of artificial FASTQ files to evaluate the performance of next-generation sequencing pipelines.

PLoS One. 2012;7(11):e49110. doi: 10.1371/journal.pone.0049110. Epub 2012 Nov 12.

BEETL-fastq: a searchable compressed archive for DNA reads.

Bioinformatics. 2014 Oct;30(19):2796-801. doi: 10.1093/bioinformatics/btu387. Epub 2014 Jun 20.

fqtools: an efficient software suite for modern FASTQ file manipulation.

Bioinformatics. 2016 Jun 15;32(12):1883-4. doi: 10.1093/bioinformatics/btw088. Epub 2016 Feb 18.

The FASTQ+ format and PISA.

Bioinformatics. 2022 Sep 30;38(19):4639-4642. doi: 10.1093/bioinformatics/btac562.

fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data.

BMC Res Notes. 2017 Jul 12;10(1):275. doi: 10.1186/s13104-017-2616-7.

UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing.

BMC Bioinformatics. 2016 Apr 16;17:165. doi: 10.1186/s12859-016-1014-9.

引用本文的文献

Hyperbaric Oxygen Regulates Tumor pH to Boost Copper-Doped Hydroxyethyl Starch Conjugate Nanoparticles Against Cancer Stem Cells.

Exploration (Beijing). 2025 Apr 3;5(4):e20240080. doi: 10.1002/EXP.20240080. eCollection 2025 Aug.

RER1 regulates lipid metabolism in monocytes and macrophages.

Cell Mol Life Sci. 2025 Aug 13;82(1):313. doi: 10.1007/s00018-025-05817-3.

Genome-wide association study for plant height and ear height in maize under well-watered and water-stressed conditions.

BMC Genomics. 2025 Aug 12;26(1):745. doi: 10.1186/s12864-025-11932-z.

Multi-omics insights into functional alterations of the liver in growth-retarded offspring: transcriptomic, epigenetic and metabolomic profiles.

BMC Genomics. 2025 Aug 5;26(1):724. doi: 10.1186/s12864-025-11896-0.

Review of open-source software for developing heterogeneous data management systems for bioinformatics applications.

Bioinform Adv. 2025 Jul 18;5(1):vbaf168. doi: 10.1093/bioadv/vbaf168. eCollection 2025.

Comparison of Chloroplast Genome Sequences of var. in Qinghai-Xizang Plateau.

Genes (Basel). 2025 Jun 30;16(7):789. doi: 10.3390/genes16070789.

The GPR30 agonist G-1 promotes hair growth via Wnt/Hedgehog signaling in mice.

Front Pharmacol. 2025 Jul 11;16:1570922. doi: 10.3389/fphar.2025.1570922. eCollection 2025.

Evaluation of sequencing reads at scale using rdeval.

Bioinformatics. 2025 Jul 22. doi: 10.1093/bioinformatics/btaf416.

Comparative transcriptome analysis of different tissues of provides new insights into the biosynthesis pathway of triterpenoid saponins.

Front Bioinform. 2025 Jul 7;5:1625145. doi: 10.3389/fbinf.2025.1625145. eCollection 2025.

First insight into metal binding proteins from the de novo transcriptome of acanthocephalan parasite Dentitruncus truttae.

Sci Rep. 2025 Jul 18;15(1):26152. doi: 10.1038/s41598-025-11623-5.

本文引用的文献

Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

High-throughput genotyping by whole-genome resequencing.

Genome Res. 2009 Jun;19(6):1068-76. doi: 10.1101/gr.089516.108. Epub 2009 May 6.

Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Bioinformatics. 2009 Jun 1;25(11):1422-3. doi: 10.1093/bioinformatics/btp163. Epub 2009 Mar 20.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.

Accurate whole human genome sequencing using reversible terminator chemistry.

Nature. 2008 Nov 6;456(7218):53-9. doi: 10.1038/nature07517.

Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.

BioJava: an open-source framework for bioinformatics.

Bioinformatics. 2008 Sep 15;24(18):2096-7. doi: 10.1093/bioinformatics/btn397. Epub 2008 Aug 8.

Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Genome Res. 2008 May;18(5):821-9. doi: 10.1101/gr.074492.107. Epub 2008 Mar 18.

Genome sequencing in microfabricated high-density picolitre reactors.

Nature. 2005 Sep 15;437(7057):376-80. doi: 10.1038/nature03959. Epub 2005 Jul 31.

Solexa Ltd.

Pharmacogenomics. 2004 Jun;5(4):433-8. doi: 10.1517/14622416.5.4.433.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Sanger 测序的 FASTQ 文件格式，用于包含质量分数的序列，以及 Solexa/Illumina FASTQ 变体。

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Sanger 测序的 FASTQ 文件格式，用于包含质量分数的序列，以及 Solexa/Illumina FASTQ 变体。

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献