Suppr超能文献

CROMqs:一种用于质量分数的无穷小逐次细化有损压缩器。

CROMqs: an infinitesimal successive refinement lossy compressor for the quality scores.

作者信息

Ochoa I, No A, Hernaez M, Weissman T

机构信息

Department of Electrical Engineering, Stanford University, Stanford CA 94305.

出版信息

Proc Inf Theory Workshop. 2016 Sep;2016:121-125. doi: 10.1109/ITW.2016.7606808. Epub 2016 Oct 27.

Abstract

Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. As a result, lossy compression of the quality scores has recently been proposed to alleviate the storage costs. Further, it has been shown that lossy compression, at some specific rates, can achieve a performance on variant calling similar to that achieved with the lossless compressed data. We propose CROMqs, a new lossy compressor for the quality scores with the property of "infinitesimal successive refinability". This property allows the decoder to decompress the data iteratively without the need of agreeing with the encoder on a specific rate prior to compression. This characteristic is particularly amenable in practice, as in most cases the appropriate rate at which the lossy compressor should operate can not be established prior to compression. Further, this property can be of interest in scenarios involving streaming of genomic data. CROMqs is the first infinitesimal successive refinement lossy compressor for the quality scores in the literature, and we show that it obtains a comparable rate-distortion performance to previously proposed algorithms. Moreover, we also show that CROMqs achieves a comparable performance on variant calling to that of the lossless compressed data.

摘要

由于测序技术的进步以及测序成本的大幅下降,正在产生大量的测序数据。这些数据大多由核苷酸以及表明其可靠性的相应质量得分组成。后者更难压缩且本身存在噪声。因此,最近有人提出对质量得分进行有损压缩以减轻存储成本。此外,研究表明,在某些特定速率下,有损压缩在变异检测方面能够实现与无损压缩数据相似的性能。我们提出了CROMqs,一种针对质量得分的新型有损压缩器,具有“无穷小逐次可细化性”。这一特性允许解码器在无需在压缩前与编码器就特定速率达成一致的情况下对数据进行迭代解压缩。这一特性在实际应用中特别适用,因为在大多数情况下,有损压缩器应运行的合适速率在压缩前无法确定。此外,这一特性在涉及基因组数据流的场景中也可能会很有用。CROMqs是文献中首个针对质量得分的无穷小逐次细化有损压缩器,我们表明它获得了与先前提出的算法相当的率失真性能。此外,我们还表明CROMqs在变异检测方面实现了与无损压缩数据相当的性能。

相似文献

3
A cluster-based approach to compression of Quality Scores.一种基于聚类的质量分数压缩方法。
Proc Data Compress Conf. 2016 Mar-Apr;2016:261-270. doi: 10.1109/DCC.2016.49. Epub 2016 Dec 19.
4
Denoising of Quality Scores for Boosted Inference and Reduced Storage.用于增强推理和减少存储的质量得分去噪
Proc Data Compress Conf. 2016 Mar-Apr;2016:251-260. doi: 10.1109/DCC.2016.92. Epub 2016 Dec 19.
6
FCLQC: fast and concurrent lossless quality scores compressor.FCLQC:快速并发无损质量评分压缩器。
BMC Bioinformatics. 2021 Dec 20;22(1):606. doi: 10.1186/s12859-021-04516-7.
7
Rateless Lossy Compression via the Extremes.基于极值的无速率有损压缩
IEEE Trans Inf Theory. 2016 Oct;62(10):5484-5495. doi: 10.1109/tit.2016.2598148. Epub 2016 Aug 12.
8
A Two-Level Scheme for Quality Score Compression.一种用于质量分数压缩的两级方案。
J Comput Biol. 2018 Oct;25(10):1141-1151. doi: 10.1089/cmb.2018.0065. Epub 2018 Jul 30.
9
QVZ: lossy compression of quality values.QVZ:质量值的有损压缩。
Bioinformatics. 2015 Oct 1;31(19):3122-9. doi: 10.1093/bioinformatics/btv330. Epub 2015 May 28.
10
GeneComp, a new reference-based compressor for SAM files.GeneComp,一种用于SAM文件的新型基于参考的压缩器。
Proc Data Compress Conf. 2017 Apr;2017:330-339. doi: 10.1109/DCC.2017.76. Epub 2017 May 11.

本文引用的文献

1
Rateless Lossy Compression via the Extremes.基于极值的无速率有损压缩
IEEE Trans Inf Theory. 2016 Oct;62(10):5484-5495. doi: 10.1109/tit.2016.2598148. Epub 2016 Aug 12.
3
QVZ: lossy compression of quality values.QVZ:质量值的有损压缩。
Bioinformatics. 2015 Oct 1;31(19):3122-9. doi: 10.1093/bioinformatics/btv330. Epub 2015 May 28.
6
Lossy compression of quality scores in genomic data.基因组数据中质量分数的有损压缩。
Bioinformatics. 2014 Aug 1;30(15):2130-6. doi: 10.1093/bioinformatics/btu183. Epub 2014 Apr 10.
9
Compression of FASTQ and SAM format sequencing data.FASTQ 和 SAM 格式测序数据的压缩。
PLoS One. 2013;8(3):e59190. doi: 10.1371/journal.pone.0059190. Epub 2013 Mar 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验