• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AQUa:一种具有随机访问功能的测序质量分数自适应压缩框架。

AQUa: an adaptive framework for compression of sequencing quality scores with random access functionality.

机构信息

Department of Electronics and Information Systems, IDLab, Ghent University - IMEC, Ghent, Belgium.

Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon 305-701, Republic of Korea.

出版信息

Bioinformatics. 2018 Feb 1;34(3):425-433. doi: 10.1093/bioinformatics/btx607.

DOI:10.1093/bioinformatics/btx607
PMID:29028894
Abstract

MOTIVATION

The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores.

RESULTS

This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49% when comparing with GNU Gzip and by up to 6.48% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47%. However, for one test file, the file size is 0.38% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee.

AVAILABILITY AND IMPLEMENTATION

The software is available on Github: https://github.com/tparidae/AQUa.

CONTACT

tom.paridaens@ugent.be.

摘要

动机

过去十年见证了新技术的引入,这些技术大大降低了基因组测序的成本。因此,必须存储和传输的基因组数据量正在呈指数级增长。为了解决存储和传输问题,我们引入了一种无损压缩质量得分的框架。

结果

本文提出了 AQUa,这是一种用于无损压缩质量得分的自适应框架。为了压缩这些质量得分,AQUa 使用了一组可配置的编码工具,扩展了上下文自适应二进制算术编码方案。在与通用单遍压缩器进行基准测试时,与 GNU Gzip 相比,文件大小最多可减少 38.49%,与 7-Zip 的 Ultra 设置相比,文件大小最多可减少 6.48%,同时仍支持随机访问。与专为单遍使用且处于最先进水平的压缩器 SCALCE 相比,它不支持随机访问,AQUa 可将文件大小最多减少 21.14%。与专为双遍使用且处于最先进水平的压缩器 QVZ 相比,它不支持随机访问,AQUa 的文件大小会增加 6.42%-33.47%。但是,对于一个测试文件,文件大小减少了 0.38%,这表明我们的单遍压缩框架具有优势。这项工作是由 ISO/IEC SC29/WG11 技术委员会目前在基因组信息表示(MPEG-G)方面的活动所推动的。

可用性和实现

软件可在 Github 上获得:https://github.com/tparidae/AQUa。

联系信息

tom.paridaens@ugent.be。

相似文献

1
AQUa: an adaptive framework for compression of sequencing quality scores with random access functionality.AQUa:一种具有随机访问功能的测序质量分数自适应压缩框架。
Bioinformatics. 2018 Feb 1;34(3):425-433. doi: 10.1093/bioinformatics/btx607.
2
AFRESh: an adaptive framework for compression of reads and assembled sequences with random access functionality.AFRESh:一种具有随机访问功能的用于压缩读取数据和组装序列的自适应框架。
Bioinformatics. 2017 May 15;33(10):1464-1472. doi: 10.1093/bioinformatics/btx001.
3
CMIC: an efficient quality score compressor with random access functionality.CMIC:一种具有随机访问功能的高效质量得分压缩器。
BMC Bioinformatics. 2022 Jul 23;23(1):294. doi: 10.1186/s12859-022-04837-1.
4
SCALCE: boosting sequence compression algorithms using locally consistent encoding.SCALCE:使用局部一致编码提升序列压缩算法。
Bioinformatics. 2012 Dec 1;28(23):3051-7. doi: 10.1093/bioinformatics/bts593. Epub 2012 Oct 9.
5
LCQS: an efficient lossless compression tool of quality scores with random access functionality.LCQS:一种具有随机访问功能的高效无损质量评分压缩工具。
BMC Bioinformatics. 2020 Mar 18;21(1):109. doi: 10.1186/s12859-020-3428-7.
6
smallWig: parallel compression of RNA-seq WIG files.smallWig:RNA序列WIG文件的并行压缩
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
7
ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.ChIPWig:一种用于 ChIP-seq 数据的随机访问支持的无损和有损压缩方法。
Bioinformatics. 2018 Mar 15;34(6):911-919. doi: 10.1093/bioinformatics/btx685.
8
FaStore: a space-saving solution for raw sequencing data.FaStore:一种节省存储空间的原始测序数据解决方案。
Bioinformatics. 2018 Aug 15;34(16):2748-2756. doi: 10.1093/bioinformatics/bty205.
9
LFQC: a lossless compression algorithm for FASTQ files.LFQC:一种用于FASTQ文件的无损压缩算法。
Bioinformatics. 2015 Oct 15;31(20):3276-81. doi: 10.1093/bioinformatics/btv384. Epub 2015 Jun 20.
10
mspack: efficient lossless and lossy mass spectrometry data compression.mspack:高效的无损和有损质谱数据压缩。
Bioinformatics. 2021 Nov 5;37(21):3923-3925. doi: 10.1093/bioinformatics/btab636.

引用本文的文献

1
PQSDC: a parallel lossless compressor for quality scores data via sequences partition and run-length prediction mapping.PQSDC:一种通过序列划分和游程长度预测映射对质量分数数据进行并行无损压缩的方法。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae323.
2
CMIC: an efficient quality score compressor with random access functionality.CMIC:一种具有随机访问功能的高效质量得分压缩器。
BMC Bioinformatics. 2022 Jul 23;23(1):294. doi: 10.1186/s12859-022-04837-1.
3
FCLQC: fast and concurrent lossless quality scores compressor.FCLQC:快速并发无损质量评分压缩器。
BMC Bioinformatics. 2021 Dec 20;22(1):606. doi: 10.1186/s12859-021-04516-7.
4
LCQS: an efficient lossless compression tool of quality scores with random access functionality.LCQS:一种具有随机访问功能的高效无损质量评分压缩工具。
BMC Bioinformatics. 2020 Mar 18;21(1):109. doi: 10.1186/s12859-020-3428-7.