• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
SPRING: a next-generation compressor for FASTQ data.SPRING:FASTQ 数据的下一代压缩程序。
Bioinformatics. 2019 Aug 1;35(15):2674-2676. doi: 10.1093/bioinformatics/bty1015.
2
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
3
ENANO: Encoder for NANOpore FASTQ files.ENANO:用于 Nanopore FASTQ 文件的编码器。
Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.
4
PgRC: pseudogenome-based read compressor.PgRC:基于假基因的读压缩程序。
Bioinformatics. 2020 Apr 1;36(7):2082-2089. doi: 10.1093/bioinformatics/btz919.
5
FaStore: a space-saving solution for raw sequencing data.FaStore:一种节省存储空间的原始测序数据解决方案。
Bioinformatics. 2018 Aug 15;34(16):2748-2756. doi: 10.1093/bioinformatics/bty205.
6
LCQS: an efficient lossless compression tool of quality scores with random access functionality.LCQS:一种具有随机访问功能的高效无损质量评分压缩工具。
BMC Bioinformatics. 2020 Mar 18;21(1):109. doi: 10.1186/s12859-020-3428-7.
7
CURC: a CUDA-based reference-free read compressor.CURC:一种基于 CUDA 的无参考读压缩器。
Bioinformatics. 2022 Jun 13;38(12):3294-3296. doi: 10.1093/bioinformatics/btac333.
8
LFQC: a lossless compression algorithm for FASTQ files.LFQC:一种用于FASTQ文件的无损压缩算法。
Bioinformatics. 2015 Oct 15;31(20):3276-81. doi: 10.1093/bioinformatics/btv384. Epub 2015 Jun 20.
9
Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach.使用近似组装方法对纳米孔测序读取进行无参考无损压缩。
Sci Rep. 2023 Feb 6;13(1):2082. doi: 10.1038/s41598-023-29267-8.
10
RENANO: a REference-based compressor for NANOpore FASTQ files.RENANO:一种基于参考的 Nanopore FASTQ 文件压缩工具。
Bioinformatics. 2021 Dec 11;37(24):4862-4864. doi: 10.1093/bioinformatics/btab437.

引用本文的文献

1
A benchmark study of compression software for human short-read sequence data.人类短读长序列数据压缩软件的基准研究。
Sci Rep. 2025 May 2;15(1):15358. doi: 10.1038/s41598-025-00491-8.
2
PgRC2: engineering the compression of sequencing reads.PgRC2:对测序读数进行压缩处理
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf101.
3
Lossless and reference-free compression of FASTQ/A files using GeneSqueeze.使用GeneSqueeze对FASTQ/A文件进行无损且无参考的压缩。
Sci Rep. 2025 Jan 2;15(1):322. doi: 10.1038/s41598-024-79258-6.
4
JARVIS3: an efficient encoder for genomic data.JARVIS3:一种用于基因组数据的高效编码器。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae725.
5
PQSDC: a parallel lossless compressor for quality scores data via sequences partition and run-length prediction mapping.PQSDC:一种通过序列划分和游程长度预测映射对质量分数数据进行并行无损压缩的方法。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae323.
6
Genie: the first open-source ISO/IEC encoder for genomic data.Genie:第一个用于基因组数据的开源 ISO/IEC 编码器。
Commun Biol. 2024 May 9;7(1):553. doi: 10.1038/s42003-024-06249-8.
7
A compressive seeding algorithm in conjunction with reordering-based compression.基于重排序的压缩与压缩播种算法。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae100.
8
PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering.PMFFRC:一种基于内存建模和冗余聚类的大规模基因组短读段压缩优化器。
BMC Bioinformatics. 2023 Nov 30;24(1):454. doi: 10.1186/s12859-023-05566-9.
9
Efficient sequencing data compression and FPGA acceleration based on a two-step framework.基于两步框架的高效测序数据压缩与现场可编程门阵列加速
Front Genet. 2023 Sep 21;14:1260531. doi: 10.3389/fgene.2023.1260531. eCollection 2023.
10
Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach.使用近似组装方法对纳米孔测序读取进行无参考无损压缩。
Sci Rep. 2023 Feb 6;13(1):2082. doi: 10.1038/s41598-023-29267-8.

本文引用的文献

1
FaStore: a space-saving solution for raw sequencing data.FaStore:一种节省存储空间的原始测序数据解决方案。
Bioinformatics. 2018 Aug 15;34(16):2748-2756. doi: 10.1093/bioinformatics/bty205.
2
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
3
Comparison of high-throughput sequencing data compression tools.高通量测序数据压缩工具比较。
Nat Methods. 2016 Dec;13(12):1005-1008. doi: 10.1038/nmeth.4037. Epub 2016 Oct 24.
4
Effect of lossy compression of quality scores on variant calling.质量分数的有损压缩对变异检测的影响。
Brief Bioinform. 2017 Mar 1;18(2):183-194. doi: 10.1093/bib/bbw011.
5
QVZ: lossy compression of quality values.QVZ:质量值的有损压缩。
Bioinformatics. 2015 Oct 1;31(19):3122-9. doi: 10.1093/bioinformatics/btv330. Epub 2015 May 28.
6
DSRC 2--Industry-oriented compression of FASTQ files.DSRC 2--面向 FASTQ 文件的行业导向压缩。
Bioinformatics. 2014 Aug 1;30(15):2213-5. doi: 10.1093/bioinformatics/btu208. Epub 2014 Apr 18.
7
Compression of FASTQ and SAM format sequencing data.FASTQ 和 SAM 格式测序数据的压缩。
PLoS One. 2013;8(3):e59190. doi: 10.1371/journal.pone.0059190. Epub 2013 Mar 22.
8
SCALCE: boosting sequence compression algorithms using locally consistent encoding.SCALCE:使用局部一致编码提升序列压缩算法。
Bioinformatics. 2012 Dec 1;28(23):3051-7. doi: 10.1093/bioinformatics/bts593. Epub 2012 Oct 9.

SPRING:FASTQ 数据的下一代压缩程序。

SPRING: a next-generation compressor for FASTQ data.

机构信息

Department of Electrical Engineering, Stanford University, Stanford, CA, USA.

Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

出版信息

Bioinformatics. 2019 Aug 1;35(15):2674-2676. doi: 10.1093/bioinformatics/bty1015.

DOI:10.1093/bioinformatics/bty1015
PMID:30535063
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6662292/
Abstract

MOTIVATION

High-Throughput Sequencing technologies produce huge amounts of data in the form of short genomic reads, associated quality values and read identifiers. Because of the significant structure present in these FASTQ datasets, general-purpose compressors are unable to completely exploit much of the inherent redundancy. Although there has been a lot of work on designing FASTQ compressors, most of them lack in support of one or more crucial properties, such as support for variable length reads, scalability to high coverage datasets, pairing-preserving compression and lossless compression.

RESULTS

In this work, we propose SPRING, a reference-free compressor for FASTQ files. SPRING supports a wide variety of compression modes and features, including lossless compression, pairing-preserving compression, lossy compression of quality values, long read compression and random access. SPRING achieves substantially better compression than existing tools, for example, SPRING compresses 195 GB of 25× whole genome human FASTQ from Illumina's NovaSeq sequencer to less than 7 GB, around 1.6× smaller than previous state-of-the-art FASTQ compressors. SPRING achieves this improvement while using comparable computational resources.

AVAILABILITY AND IMPLEMENTATION

SPRING can be downloaded from https://github.com/shubhamchandak94/SPRING.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序技术以短基因组读段、相关质量值和读段标识符的形式生成大量数据。由于这些 FASTQ 数据集具有显著的结构,通用压缩器无法完全利用其中的大部分固有冗余。尽管已经有很多关于设计 FASTQ 压缩器的工作,但其中大多数都缺乏对一个或多个关键特性的支持,例如支持可变长度读段、可扩展到高覆盖率数据集、保留配对的压缩和无损压缩。

结果

在这项工作中,我们提出了 SPRING,一种用于 FASTQ 文件的无参考压缩器。SPRING 支持各种压缩模式和功能,包括无损压缩、保留配对的压缩、质量值的有损压缩、长读段压缩和随机访问。SPRING 实现了比现有工具更好的压缩效果,例如,SPRING 将 Illumina 的 NovaSeq 测序仪生成的 25×全基因组人类 FASTQ 压缩到不到 7GB,比以前的最先进的 FASTQ 压缩器小约 1.6 倍。SPRING 在使用可比计算资源的同时实现了这一改进。

可用性和实现

可以从 https://github.com/shubhamchandak94/SPRING 下载 SPRING。

补充信息

补充数据可在 Bioinformatics 在线获取。