Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.
Bioinformatics. 2011 Mar 15;27(6):860-2. doi: 10.1093/bioinformatics/btr014. Epub 2011 Jan 19.
Modern sequencing instruments are able to generate at least hundreds of millions short reads of genomic data. Those huge volumes of data require effective means to store them, provide quick access to any record and enable fast decompression.
We present a specialized compression algorithm for genomic data in FASTQ format which dominates its competitor, G-SQZ, as is shown on a number of datasets from the 1000 Genomes Project (www.1000genomes.org).
DSRC is freely available at http:/sun.aei.polsl.pl/dsrc.
现代测序仪器能够生成至少数亿条基因组数据的短读段。这些海量数据需要有效的存储手段,以便快速访问任何记录并实现快速解压。
我们提出了一种针对 FASTQ 格式基因组数据的专用压缩算法,在来自 1000 基因组计划(www.1000genomes.org)的多个数据集上的表现均优于其竞争对手 G-SQZ。
DSRC 可在 http://sun.aei.polsl.pl/dsrc 上免费获取。