Suppr超能文献

JARVIS3:一种用于基因组数据的高效编码器。

JARVIS3: an efficient encoder for genomic data.

作者信息

Sousa Maria J P, Pinho Armando J, Pratas Diogo

机构信息

Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.

Department of Electronics, Telecommunications and Informatics (DETI), University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae725.

Abstract

MOTIVATION

Large-scale genomic projects grapple with the complex challenge of reducing medium- and long-term storage space and its associated energy consumption, monetary costs, and environmental footprint.

RESULTS

We present JARVIS3, an advanced tool engineered for the efficient reference-free compression of genomic sequences. JARVIS3 introduces a pioneering approach, specifically through enhanced table memory models and probabilistic lookup-tables applied in repeat models. These optimizations are pivotal in substantially enhancing computational efficiency. JARVIS3 offers three distinct profiles: (i) rapid computation with moderate compression, (ii) a balanced trade-off between time and compression, and (iii) slower computation with significantly higher compression ratios. The implementation of JARVIS3 is rooted in the C programming language, building upon the success of its predecessor, JARVIS2. JARVIS3 shows substantial speed improvements relative to JARVIS2 while providing slightly better compression. Furthermore, we provide a versatile C/Bash implementation, facilitating the application in FASTA and FASTQ data, including the capability for parallel computation. In addition, JARVIS3 includes a mode for outputting bit information, as well as providing the Normalized Compression and bit rates, facilitating compression-based analysis. This establishes JARVIS3 as an open-source solution for genomic data compression and analysis.

AVAILABILITY AND IMPLEMENTATION

JARVIS3 is freely available at https://github.com/cobilab/jarvis3.

摘要

动机

大型基因组项目面临着减少中长期存储空间及其相关能源消耗、货币成本和环境足迹这一复杂挑战。

结果

我们展示了JARVIS3,这是一种为基因组序列的高效无参考压缩而设计的先进工具。JARVIS3引入了一种开创性方法,特别是通过增强表内存模型和应用于重复模型的概率查找表。这些优化对于大幅提高计算效率至关重要。JARVIS3提供三种不同的配置文件:(i)具有适度压缩的快速计算,(ii)时间与压缩之间的平衡权衡,以及(iii)具有显著更高压缩率的较慢计算。JARVIS3的实现基于C编程语言,它是在其前身JARVIS2成功的基础上构建的。与JARVIS2相比,JARVIS3显示出大幅的速度提升,同时提供略好的压缩效果。此外,我们提供了一种通用的C/Bash实现方式,便于在FASTA和FASTQ数据中应用,包括并行计算能力。此外,JARVIS3包括一种输出位信息的模式,以及提供归一化压缩率和比特率,便于基于压缩的分析。这使JARVIS3成为基因组数据压缩和分析的开源解决方案。

可用性和实现方式

JARVIS3可在https://github.com/cobilab/jarvis3上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d96/11645547/1ed1b184d9ec/btae725f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验