Suppr超能文献

ENANO:用于 Nanopore FASTQ 文件的编码器。

ENANO: Encoder for NANOpore FASTQ files.

机构信息

Facultad de Ingeniería, Universidad de la República, Montevideo 11300, Uruguay.

Xperi Corp, San Jose, CA 95134, USA.

出版信息

Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.

Abstract

MOTIVATION

The amount of genomic data generated globally is seeing explosive growth, leading to increasing needs for processing, storage and transmission resources, which motivates the development of efficient compression tools for these data. Work so far has focused mainly on the compression of data generated by short-read technologies. However, nanopore sequencing technologies are rapidly gaining popularity due to the advantages offered by the large increase in the average size of the produced reads, the reduction in their cost and the portability of the sequencing technology. We present ENANO (Encoder for NANOpore), a novel lossless compression algorithm especially designed for nanopore sequencing FASTQ files.

RESULTS

The main focus of ENANO is on the compression of the quality scores, as they dominate the size of the compressed file. ENANO offers two modes, Maximum Compression and Fast (default), which trade-off compression efficiency and speed. We tested ENANO, the current state-of-the-art compressor SPRING and the general compressor pigz on several publicly available nanopore datasets. The results show that the proposed algorithm consistently achieves the best compression performance (in both modes) on every considered nanopore dataset, with an average improvement over pigz and SPRING of >24.7% and 6.3%, respectively. In addition, in terms of encoding and decoding speeds, ENANO is 2.9× and 1.7× times faster than SPRING, respectively, with memory consumption up to 0.2 GB.

AVAILABILITY AND IMPLEMENTATION

ENANO is freely available for download at: https://github.com/guilledufort/EnanoFASTQ.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全球生成的基因组数据量呈爆炸式增长,导致对处理、存储和传输资源的需求不断增加,这促使人们开发出用于这些数据的高效压缩工具。迄今为止,这项工作主要集中在短读长技术生成的数据的压缩上。然而,由于产生的读长平均大小增加、成本降低以及测序技术的便携性,纳米孔测序技术正迅速普及。我们提出了 ENANO(纳米孔测序 FASTQ 文件的无损压缩算法),这是一种专门为纳米孔测序 FASTQ 文件设计的新型无损压缩算法。

结果

ENANO 的主要重点是压缩质量分数,因为它们占据了压缩文件的大部分大小。ENANO 提供了两种模式,最大压缩和快速(默认),这两种模式在压缩效率和速度之间进行权衡。我们在几个公开的纳米孔数据集上测试了 ENANO、当前最先进的压缩器 SPRING 和通用压缩器 pigz。结果表明,该算法在每个考虑的纳米孔数据集上始终实现了最佳的压缩性能(在两种模式下),与 pigz 和 SPRING 相比,平均提高了>24.7%和 6.3%。此外,在编码和解码速度方面,ENANO 分别比 SPRING 快 2.9 倍和 1.7 倍,内存消耗最高可达 0.2GB。

可用性和实现

ENANO 可在 https://github.com/guilledufort/EnanoFASTQ 上免费下载。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验