Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Japan.
Micro/Nano Technology Center, Tokai University, Hiratsuka, Japan.
Bioinformatics. 2019 Oct 1;35(19):3826-3828. doi: 10.1093/bioinformatics/btz144.
DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)-a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd.
NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use.
Supplementary data are available at Bioinformatics online.
DNA 序列数据库使用压缩技术(如 gzip)来减少所需的存储空间和网络传输时间。我们描述了一种新的文件格式——核苷酸档案格式(Nucleotide Archival Format,NAF),用于无损、无参考的 FASTA 和 FASTQ 格式核苷酸序列的压缩。核苷酸档案格式的压缩比可与最佳的 DNA 压缩器相媲美,同时提供更快的解压速度。我们将我们的格式与 DNA 压缩器 DELIMINATE 和 MFCompress 进行了比较,并与通用压缩器 gzip、bzip2、xz、brotli 和 zstd 进行了比较。
NAF 压缩器和解压缩器以及格式规范可在 https://github.com/KirillKryukov/naf 上获得。格式规范属于公有领域。压缩器和解压缩器根据 zlib/libpng 许可证开源,几乎可免费用于几乎任何用途。
补充数据可在 Bioinformatics 在线获得。