Suppr超能文献

RabbitFX:适用于现代多核平台的 FASTA/Q 文件解析的高效框架。

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2341-2348. doi: 10.1109/TCBB.2022.3219114. Epub 2023 Jun 5.

Abstract

The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.

摘要

随着生成测序数据的持续增长,各种相关的生物信息学工具也得到了发展。然而,由于这些工具在解析文件时存在瓶颈,导致执行时间缓慢,因此许多工具都无法充分利用现代多核系统的资源。这促使我们设计了一种有效的测序数据解析方法,以便能够利用现代硬件的强大功能,尤其是对于具有快速存储设备的现代 CPU。我们开发了 RabbitFX,这是一个用于在现代多核平台上处理生物测序数据的快速、高效、易用的框架。它通过结合使用优化的格式化实现的轻量级解析方法,可以有效地读取 FASTA 和 FASTQ 文件。此外,我们还提供了用户友好和模块化的 C++ API,可以轻松集成到应用程序中,以提高它们的文件解析速度。作为概念验证,我们已经将 RabbitFX 集成到三个 I/O 密集型应用程序中:fastp、Ktrim 和 Mash。我们的评估表明,与原始版本相比,在普通(gzip 压缩)文件上,分别至少提高了 11.6(6.6)、2.4(2.4)和 3.7(3.2)倍。这些案例研究表明,RabbitFX 可以轻松集成到各种 NGS 分析工具中,显著缩短相关的运行时间。它是一个开源软件,可在 https://github.com/RabbitBio/RabbitFX 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验