Suppr超能文献

利用并行最大精确匹配检索加速长 RNA 测序reads 的拼接比对。

Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval.

机构信息

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China; College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, China.

出版信息

Comput Biol Med. 2024 Jun;175:108542. doi: 10.1016/j.compbiomed.2024.108542. Epub 2024 May 3.

Abstract

The genomics landscape has undergone a revolutionary transformation with the emergence of third-generation sequencing technologies. Fueled by the exponential surge in sequencing data, there is an urgent demand for accurate and rapid algorithms to effectively handle this burgeoning influx. Under such circumstances, we developed a parallelized, yet accuracy-lossless algorithm for maximal exact match (MEM) retrieval to strategically address the computational bottleneck of uLTRA, a leading spliced alignment algorithm known for its precision in handling long RNA sequencing (RNA-seq) reads. The design of the algorithm incorporates a multi-threaded strategy, enabling the concurrent processing of multiple reads simultaneously. Additionally, we implemented the serialization of index required for MEM retrieval to facilitate its reuse, resulting in accelerated startup for practical tasks. Extensive experiments demonstrate that our parallel algorithm achieves significant improvements in runtime, speedup, throughput, and memory usage. When applied to the largest human dataset, the algorithm achieves an impressive speedup of 10.78 × , significantly improving throughput on a large scale. Moreover, the integration of the parallel MEM retrieval algorithm into the uLTRA pipeline introduces a dual-layered parallel capability, consistently yielding a speedup of 4.99 × compared to the multi-process and single-threaded execution of uLTRA. The thorough analysis of experimental results underscores the adept utilization of parallel processing capabilities and its advantageous performance in handling large datasets. This study provides a showcase of parallelized strategies for MEM retrieval within the context of spliced alignment algorithm, effectively facilitating the process of RNA-seq data analysis. The code is available at https://github.com/RongxingWong/AcceleratingSplicedAlignment.

摘要

第三代测序技术的出现使基因组学领域发生了革命性的变化。随着测序数据的指数级增长,人们迫切需要准确、快速的算法来有效地处理这种不断增加的数据。在这种情况下,我们开发了一种并行的、无精度损失的最大精确匹配(MEM)检索算法,以战略性地解决 uLTRA 的计算瓶颈问题,uLTRA 是一种领先的拼接对齐算法,以其在处理长 RNA 测序(RNA-seq)reads 方面的精度而闻名。该算法的设计采用了多线程策略,能够同时处理多个reads。此外,我们实现了 MEM 检索所需的索引序列化,以方便其重用,从而加速实际任务的启动。广泛的实验表明,我们的并行算法在运行时间、加速比、吞吐量和内存使用方面都有显著的改进。当应用于最大的人类数据集时,该算法实现了令人印象深刻的 10.78 倍的加速,大大提高了大规模的吞吐量。此外,将并行 MEM 检索算法集成到 uLTRA 流水线中引入了双层并行能力,与 uLTRA 的多进程和单线程执行相比,始终能实现 4.99 倍的加速。实验结果的深入分析强调了并行处理能力的巧妙利用及其在处理大型数据集方面的优势性能。本研究展示了在拼接对齐算法中进行 MEM 检索的并行化策略,有效地促进了 RNA-seq 数据分析的进程。代码可在 https://github.com/RongxingWong/AcceleratingSplicedAlignment 获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验