• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用近似组装方法对纳米孔测序读取进行无参考无损压缩。

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach.

机构信息

Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA.

出版信息

Sci Rep. 2023 Feb 6;13(1):2082. doi: 10.1038/s41598-023-29267-8.

DOI:10.1038/s41598-023-29267-8
PMID:36747011
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9902536/
Abstract

The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35-0.65 bits per base which is 3-6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring .

摘要

基因组测序实验产生的数据量在过去几年中迅速增长,因此压缩对于数据的高效存储、传输和分析非常重要。近年来,由于纳米孔测序技术具有便携性、实时性和提供长读长的特点,因此越来越多地被采用。然而,由于大多数现有工具要么是通用的,要么是专门用于短读长数据的,因此在 FASTQ 文件中获得的纳米孔测序读长的压缩方面进展有限。我们提出了 NanoSpring,这是一种针对纳米孔测序读长的无参考压缩器,依赖于近似组装方法。我们在各种数据集上评估了 NanoSpring,包括细菌、宏基因组、植物、动物和人类全基因组数据。对于最近碱基调用的高质量纳米孔数据集,NanoSpring 只关注 FASTQ 文件中的碱基序列,每个碱基仅使用 0.35-0.65 位,比 gzip 等通用压缩器低 3-6 个数量级。NanoSpring 在压缩率和压缩资源使用方面与最先进的工具 CoLoRd 具有竞争力,而在使用多个线程时(使用 20 个线程时,解压速度快 4 倍以上),解压速度明显更快。NanoSpring 可在 GitHub 上获得,网址为 https://github.com/qm2/NanoSpring。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/001374c1ca30/41598_2023_29267_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/2b761094227a/41598_2023_29267_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/ca82c0428d93/41598_2023_29267_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/fe55ac7282c8/41598_2023_29267_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/9716b5ab5cdb/41598_2023_29267_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/001374c1ca30/41598_2023_29267_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/2b761094227a/41598_2023_29267_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/ca82c0428d93/41598_2023_29267_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/fe55ac7282c8/41598_2023_29267_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/9716b5ab5cdb/41598_2023_29267_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/001374c1ca30/41598_2023_29267_Fig5_HTML.jpg

相似文献

1
Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach.使用近似组装方法对纳米孔测序读取进行无参考无损压缩。
Sci Rep. 2023 Feb 6;13(1):2082. doi: 10.1038/s41598-023-29267-8.
2
SPRING: a next-generation compressor for FASTQ data.SPRING:FASTQ 数据的下一代压缩程序。
Bioinformatics. 2019 Aug 1;35(15):2674-2676. doi: 10.1093/bioinformatics/bty1015.
3
ENANO: Encoder for NANOpore FASTQ files.ENANO:用于 Nanopore FASTQ 文件的编码器。
Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.
4
RENANO: a REference-based compressor for NANOpore FASTQ files.RENANO:一种基于参考的 Nanopore FASTQ 文件压缩工具。
Bioinformatics. 2021 Dec 11;37(24):4862-4864. doi: 10.1093/bioinformatics/btab437.
5
Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences.核苷酸档案格式 (NAF) 可实现 DNA 序列的高效无损、无参考自由压缩。
Bioinformatics. 2019 Oct 1;35(19):3826-3828. doi: 10.1093/bioinformatics/btz144.
6
LCQS: an efficient lossless compression tool of quality scores with random access functionality.LCQS:一种具有随机访问功能的高效无损质量评分压缩工具。
BMC Bioinformatics. 2020 Mar 18;21(1):109. doi: 10.1186/s12859-020-3428-7.
7
PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering.PMFFRC:一种基于内存建模和冗余聚类的大规模基因组短读段压缩优化器。
BMC Bioinformatics. 2023 Nov 30;24(1):454. doi: 10.1186/s12859-023-05566-9.
8
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
9
PQSDC: a parallel lossless compressor for quality scores data via sequences partition and run-length prediction mapping.PQSDC:一种通过序列划分和游程长度预测映射对质量分数数据进行并行无损压缩的方法。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae323.
10
CURC: a CUDA-based reference-free read compressor.CURC:一种基于 CUDA 的无参考读压缩器。
Bioinformatics. 2022 Jun 13;38(12):3294-3296. doi: 10.1093/bioinformatics/btac333.

引用本文的文献

1
OReO: optimizing read order for practical compression.OReO:优化实际压缩的读取顺序
Bioinform Adv. 2025 Jun 3;5(1):vbaf128. doi: 10.1093/bioadv/vbaf128. eCollection 2025.
2
PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering.PMFFRC:一种基于内存建模和冗余聚类的大规模基因组短读段压缩优化器。
BMC Bioinformatics. 2023 Nov 30;24(1):454. doi: 10.1186/s12859-023-05566-9.
3
Portable nanopore-sequencing technology: Trends in development and applications.便携式纳米孔测序技术:发展趋势与应用

本文引用的文献

1
Nanopore quality score resolution can be reduced with little effect on downstream analysis.纳米孔质量得分分辨率可以降低,而对下游分析的影响很小。
Bioinform Adv. 2022 Aug 11;2(1):vbac054. doi: 10.1093/bioadv/vbac054. eCollection 2022.
2
CoLoRd: compressing long reads.CoLoRd:压缩长读。
Nat Methods. 2022 Apr;19(4):441-444. doi: 10.1038/s41592-022-01432-3. Epub 2022 Mar 28.
3
Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing.利用纳米孔测序技术构建无间隙端粒到端粒的香蕉染色体。
Front Microbiol. 2023 Feb 1;14:1043967. doi: 10.3389/fmicb.2023.1043967. eCollection 2023.
4
Nanopore quality score resolution can be reduced with little effect on downstream analysis.纳米孔质量得分分辨率可以降低,而对下游分析的影响很小。
Bioinform Adv. 2022 Aug 11;2(1):vbac054. doi: 10.1093/bioadv/vbac054. eCollection 2022.
Commun Biol. 2021 Sep 7;4(1):1047. doi: 10.1038/s42003-021-02559-3.
4
RENANO: a REference-based compressor for NANOpore FASTQ files.RENANO:一种基于参考的 Nanopore FASTQ 文件压缩工具。
Bioinformatics. 2021 Dec 11;37(24):4862-4864. doi: 10.1093/bioinformatics/btab437.
5
Benchmarking Oxford Nanopore read assemblers for high-quality molluscan genomes.针对高质量软体动物基因组对牛津纳米孔测序组装程序进行基准测试。
Philos Trans R Soc Lond B Biol Sci. 2021 May 24;376(1825):20200160. doi: 10.1098/rstb.2020.0160. Epub 2021 Apr 5.
6
Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy.纳米孔原始信号数据的有损压缩对碱基识别和一致性准确性的影响。
Bioinformatics. 2021 Apr 1;36(22-23):5313-5321. doi: 10.1093/bioinformatics/btaa1017.
7
ENANO: Encoder for NANOpore FASTQ files.ENANO:用于 Nanopore FASTQ 文件的编码器。
Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.
8
PgRC: pseudogenome-based read compressor.PgRC:基于假基因的读压缩程序。
Bioinformatics. 2020 Apr 1;36(7):2082-2089. doi: 10.1093/bioinformatics/btz919.
9
Fast and accurate long-read assembly with wtdbg2.使用 wtdbg2 实现快速准确的长读长序列组装。
Nat Methods. 2020 Feb;17(2):155-158. doi: 10.1038/s41592-019-0669-3. Epub 2019 Dec 9.
10
Performance of neural network basecalling tools for Oxford Nanopore sequencing.基于神经网络的牛津纳米孔测序碱基调用工具的性能。
Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y.