Suppr超能文献

使用近似组装方法对纳米孔测序读取进行无参考无损压缩。

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach.

机构信息

Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA.

出版信息

Sci Rep. 2023 Feb 6;13(1):2082. doi: 10.1038/s41598-023-29267-8.

Abstract

The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35-0.65 bits per base which is 3-6[Formula: see text] lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4[Formula: see text] faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring .

摘要

基因组测序实验产生的数据量在过去几年中迅速增长,因此压缩对于数据的高效存储、传输和分析非常重要。近年来,由于纳米孔测序技术具有便携性、实时性和提供长读长的特点,因此越来越多地被采用。然而,由于大多数现有工具要么是通用的,要么是专门用于短读长数据的,因此在 FASTQ 文件中获得的纳米孔测序读长的压缩方面进展有限。我们提出了 NanoSpring,这是一种针对纳米孔测序读长的无参考压缩器,依赖于近似组装方法。我们在各种数据集上评估了 NanoSpring,包括细菌、宏基因组、植物、动物和人类全基因组数据。对于最近碱基调用的高质量纳米孔数据集,NanoSpring 只关注 FASTQ 文件中的碱基序列,每个碱基仅使用 0.35-0.65 位,比 gzip 等通用压缩器低 3-6 个数量级。NanoSpring 在压缩率和压缩资源使用方面与最先进的工具 CoLoRd 具有竞争力,而在使用多个线程时(使用 20 个线程时,解压速度快 4 倍以上),解压速度明显更快。NanoSpring 可在 GitHub 上获得,网址为 https://github.com/qm2/NanoSpring。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d1/9902536/2b761094227a/41598_2023_29267_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验