Suppr超能文献

AQUa:一种具有随机访问功能的测序质量分数自适应压缩框架。

AQUa: an adaptive framework for compression of sequencing quality scores with random access functionality.

机构信息

Department of Electronics and Information Systems, IDLab, Ghent University - IMEC, Ghent, Belgium.

Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon 305-701, Republic of Korea.

出版信息

Bioinformatics. 2018 Feb 1;34(3):425-433. doi: 10.1093/bioinformatics/btx607.

Abstract

MOTIVATION

The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores.

RESULTS

This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49% when comparing with GNU Gzip and by up to 6.48% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47%. However, for one test file, the file size is 0.38% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee.

AVAILABILITY AND IMPLEMENTATION

The software is available on Github: https://github.com/tparidae/AQUa.

CONTACT

tom.paridaens@ugent.be.

摘要

动机

过去十年见证了新技术的引入,这些技术大大降低了基因组测序的成本。因此,必须存储和传输的基因组数据量正在呈指数级增长。为了解决存储和传输问题,我们引入了一种无损压缩质量得分的框架。

结果

本文提出了 AQUa,这是一种用于无损压缩质量得分的自适应框架。为了压缩这些质量得分,AQUa 使用了一组可配置的编码工具,扩展了上下文自适应二进制算术编码方案。在与通用单遍压缩器进行基准测试时,与 GNU Gzip 相比,文件大小最多可减少 38.49%,与 7-Zip 的 Ultra 设置相比,文件大小最多可减少 6.48%,同时仍支持随机访问。与专为单遍使用且处于最先进水平的压缩器 SCALCE 相比,它不支持随机访问,AQUa 可将文件大小最多减少 21.14%。与专为双遍使用且处于最先进水平的压缩器 QVZ 相比,它不支持随机访问,AQUa 的文件大小会增加 6.42%-33.47%。但是,对于一个测试文件,文件大小减少了 0.38%,这表明我们的单遍压缩框架具有优势。这项工作是由 ISO/IEC SC29/WG11 技术委员会目前在基因组信息表示(MPEG-G)方面的活动所推动的。

可用性和实现

软件可在 Github 上获得:https://github.com/tparidae/AQUa。

联系信息

tom.paridaens@ugent.be

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验