Suppr超能文献

ENANO:用于 Nanopore FASTQ 文件的编码器。

ENANO: Encoder for NANOpore FASTQ files.

机构信息

Facultad de Ingeniería, Universidad de la República, Montevideo 11300, Uruguay.

Xperi Corp, San Jose, CA 95134, USA.

出版信息

Bioinformatics. 2020 Aug 15;36(16):4506-4507. doi: 10.1093/bioinformatics/btaa551.

Abstract

MOTIVATION

The amount of genomic data generated globally is seeing explosive growth, leading to increasing needs for processing, storage and transmission resources, which motivates the development of efficient compression tools for these data. Work so far has focused mainly on the compression of data generated by short-read technologies. However, nanopore sequencing technologies are rapidly gaining popularity due to the advantages offered by the large increase in the average size of the produced reads, the reduction in their cost and the portability of the sequencing technology. We present ENANO (Encoder for NANOpore), a novel lossless compression algorithm especially designed for nanopore sequencing FASTQ files.

RESULTS

The main focus of ENANO is on the compression of the quality scores, as they dominate the size of the compressed file. ENANO offers two modes, Maximum Compression and Fast (default), which trade-off compression efficiency and speed. We tested ENANO, the current state-of-the-art compressor SPRING and the general compressor pigz on several publicly available nanopore datasets. The results show that the proposed algorithm consistently achieves the best compression performance (in both modes) on every considered nanopore dataset, with an average improvement over pigz and SPRING of >24.7% and 6.3%, respectively. In addition, in terms of encoding and decoding speeds, ENANO is 2.9× and 1.7× times faster than SPRING, respectively, with memory consumption up to 0.2 GB.

AVAILABILITY AND IMPLEMENTATION

ENANO is freely available for download at: https://github.com/guilledufort/EnanoFASTQ.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全球生成的基因组数据量呈爆炸式增长,导致对处理、存储和传输资源的需求不断增加,这促使人们开发出用于这些数据的高效压缩工具。迄今为止,这项工作主要集中在短读长技术生成的数据的压缩上。然而,由于产生的读长平均大小增加、成本降低以及测序技术的便携性,纳米孔测序技术正迅速普及。我们提出了 ENANO(纳米孔测序 FASTQ 文件的无损压缩算法),这是一种专门为纳米孔测序 FASTQ 文件设计的新型无损压缩算法。

结果

ENANO 的主要重点是压缩质量分数,因为它们占据了压缩文件的大部分大小。ENANO 提供了两种模式,最大压缩和快速(默认),这两种模式在压缩效率和速度之间进行权衡。我们在几个公开的纳米孔数据集上测试了 ENANO、当前最先进的压缩器 SPRING 和通用压缩器 pigz。结果表明,该算法在每个考虑的纳米孔数据集上始终实现了最佳的压缩性能(在两种模式下),与 pigz 和 SPRING 相比,平均提高了>24.7%和 6.3%。此外,在编码和解码速度方面,ENANO 分别比 SPRING 快 2.9 倍和 1.7 倍,内存消耗最高可达 0.2GB。

可用性和实现

ENANO 可在 https://github.com/guilledufort/EnanoFASTQ 上免费下载。

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验