Plant Pathology, SCRI, Invergowrie, Dundee DD2 5DA, UK.
Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16.
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
FASTQ 已成为一种通用的文件格式,用于共享测序读取数据,其中包含序列和每个碱基的相关质量评分,尽管迄今为止它还没有正式的定义,并且至少存在三种不兼容的变体。本文基于 MAQ 文档和最近由 Open Bioinformatics Foundation 项目 Biopython、BioPerl、BioRuby、BioJava 和 EMBOSS 共同商定的约定等公开信息,定义了 FASTQ 格式,涵盖了原始的桑格标准、Solexa/Illumina 变体以及它们之间的转换。作为一个开放获取的出版物,希望这个描述以及提供的示例文件作为补充数据,将来能成为这个重要文件格式的参考。