Ochoa I, No A, Hernaez M, Weissman T
Department of Electrical Engineering, Stanford University, Stanford CA 94305.
Proc Inf Theory Workshop. 2016 Sep;2016:121-125. doi: 10.1109/ITW.2016.7606808. Epub 2016 Oct 27.
Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. As a result, lossy compression of the quality scores has recently been proposed to alleviate the storage costs. Further, it has been shown that lossy compression, at some specific rates, can achieve a performance on variant calling similar to that achieved with the lossless compressed data. We propose CROMqs, a new lossy compressor for the quality scores with the property of "infinitesimal successive refinability". This property allows the decoder to decompress the data iteratively without the need of agreeing with the encoder on a specific rate prior to compression. This characteristic is particularly amenable in practice, as in most cases the appropriate rate at which the lossy compressor should operate can not be established prior to compression. Further, this property can be of interest in scenarios involving streaming of genomic data. CROMqs is the first infinitesimal successive refinement lossy compressor for the quality scores in the literature, and we show that it obtains a comparable rate-distortion performance to previously proposed algorithms. Moreover, we also show that CROMqs achieves a comparable performance on variant calling to that of the lossless compressed data.
由于测序技术的进步以及测序成本的大幅下降,正在产生大量的测序数据。这些数据大多由核苷酸以及表明其可靠性的相应质量得分组成。后者更难压缩且本身存在噪声。因此,最近有人提出对质量得分进行有损压缩以减轻存储成本。此外,研究表明,在某些特定速率下,有损压缩在变异检测方面能够实现与无损压缩数据相似的性能。我们提出了CROMqs,一种针对质量得分的新型有损压缩器,具有“无穷小逐次可细化性”。这一特性允许解码器在无需在压缩前与编码器就特定速率达成一致的情况下对数据进行迭代解压缩。这一特性在实际应用中特别适用,因为在大多数情况下,有损压缩器应运行的合适速率在压缩前无法确定。此外,这一特性在涉及基因组数据流的场景中也可能会很有用。CROMqs是文献中首个针对质量得分的无穷小逐次细化有损压缩器,我们表明它获得了与先前提出的算法相当的率失真性能。此外,我们还表明CROMqs在变异检测方面实现了与无损压缩数据相当的性能。