Suppr超能文献

SPRING:FASTQ 数据的下一代压缩程序。

SPRING: a next-generation compressor for FASTQ data.

机构信息

Department of Electrical Engineering, Stanford University, Stanford, CA, USA.

Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

出版信息

Bioinformatics. 2019 Aug 1;35(15):2674-2676. doi: 10.1093/bioinformatics/bty1015.

Abstract

MOTIVATION

High-Throughput Sequencing technologies produce huge amounts of data in the form of short genomic reads, associated quality values and read identifiers. Because of the significant structure present in these FASTQ datasets, general-purpose compressors are unable to completely exploit much of the inherent redundancy. Although there has been a lot of work on designing FASTQ compressors, most of them lack in support of one or more crucial properties, such as support for variable length reads, scalability to high coverage datasets, pairing-preserving compression and lossless compression.

RESULTS

In this work, we propose SPRING, a reference-free compressor for FASTQ files. SPRING supports a wide variety of compression modes and features, including lossless compression, pairing-preserving compression, lossy compression of quality values, long read compression and random access. SPRING achieves substantially better compression than existing tools, for example, SPRING compresses 195 GB of 25× whole genome human FASTQ from Illumina's NovaSeq sequencer to less than 7 GB, around 1.6× smaller than previous state-of-the-art FASTQ compressors. SPRING achieves this improvement while using comparable computational resources.

AVAILABILITY AND IMPLEMENTATION

SPRING can be downloaded from https://github.com/shubhamchandak94/SPRING.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序技术以短基因组读段、相关质量值和读段标识符的形式生成大量数据。由于这些 FASTQ 数据集具有显著的结构,通用压缩器无法完全利用其中的大部分固有冗余。尽管已经有很多关于设计 FASTQ 压缩器的工作,但其中大多数都缺乏对一个或多个关键特性的支持,例如支持可变长度读段、可扩展到高覆盖率数据集、保留配对的压缩和无损压缩。

结果

在这项工作中,我们提出了 SPRING,一种用于 FASTQ 文件的无参考压缩器。SPRING 支持各种压缩模式和功能,包括无损压缩、保留配对的压缩、质量值的有损压缩、长读段压缩和随机访问。SPRING 实现了比现有工具更好的压缩效果,例如,SPRING 将 Illumina 的 NovaSeq 测序仪生成的 25×全基因组人类 FASTQ 压缩到不到 7GB,比以前的最先进的 FASTQ 压缩器小约 1.6 倍。SPRING 在使用可比计算资源的同时实现了这一改进。

可用性和实现

可以从 https://github.com/shubhamchandak94/SPRING 下载 SPRING。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
SPRING: a next-generation compressor for FASTQ data.SPRING:FASTQ 数据的下一代压缩程序。
Bioinformatics. 2019 Aug 1;35(15):2674-2676. doi: 10.1093/bioinformatics/bty1015.
3
ENANO: Encoder for NANOpore FASTQ files.ENANO:用于 Nanopore FASTQ 文件的编码器。
Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.
4
PgRC: pseudogenome-based read compressor.PgRC:基于假基因的读压缩程序。
Bioinformatics. 2020 Apr 1;36(7):2082-2089. doi: 10.1093/bioinformatics/btz919.
7
CURC: a CUDA-based reference-free read compressor.CURC:一种基于 CUDA 的无参考读压缩器。
Bioinformatics. 2022 Jun 13;38(12):3294-3296. doi: 10.1093/bioinformatics/btac333.
8
LFQC: a lossless compression algorithm for FASTQ files.LFQC:一种用于FASTQ文件的无损压缩算法。
Bioinformatics. 2015 Oct 15;31(20):3276-81. doi: 10.1093/bioinformatics/btv384. Epub 2015 Jun 20.

引用本文的文献

2
PgRC2: engineering the compression of sequencing reads.PgRC2:对测序读数进行压缩处理
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf101.
4
JARVIS3: an efficient encoder for genomic data.JARVIS3:一种用于基因组数据的高效编码器。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae725.

本文引用的文献

3
Comparison of high-throughput sequencing data compression tools.高通量测序数据压缩工具比较。
Nat Methods. 2016 Dec;13(12):1005-1008. doi: 10.1038/nmeth.4037. Epub 2016 Oct 24.
5
QVZ: lossy compression of quality values.QVZ:质量值的有损压缩。
Bioinformatics. 2015 Oct 1;31(19):3122-9. doi: 10.1093/bioinformatics/btv330. Epub 2015 May 28.
6
DSRC 2--Industry-oriented compression of FASTQ files.DSRC 2--面向 FASTQ 文件的行业导向压缩。
Bioinformatics. 2014 Aug 1;30(15):2213-5. doi: 10.1093/bioinformatics/btu208. Epub 2014 Apr 18.
7
Compression of FASTQ and SAM format sequencing data.FASTQ 和 SAM 格式测序数据的压缩。
PLoS One. 2013;8(3):e59190. doi: 10.1371/journal.pone.0059190. Epub 2013 Mar 22.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验