利用并行最大精确匹配检索加速长 RNA 测序reads 的拼接比对。

Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval.

机构信息

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China; College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, China.

出版信息

Comput Biol Med. 2024 Jun;175:108542. doi: 10.1016/j.compbiomed.2024.108542. Epub 2024 May 3.

DOI:10.1016/j.compbiomed.2024.108542

PMID:38714048

Abstract

The genomics landscape has undergone a revolutionary transformation with the emergence of third-generation sequencing technologies. Fueled by the exponential surge in sequencing data, there is an urgent demand for accurate and rapid algorithms to effectively handle this burgeoning influx. Under such circumstances, we developed a parallelized, yet accuracy-lossless algorithm for maximal exact match (MEM) retrieval to strategically address the computational bottleneck of uLTRA, a leading spliced alignment algorithm known for its precision in handling long RNA sequencing (RNA-seq) reads. The design of the algorithm incorporates a multi-threaded strategy, enabling the concurrent processing of multiple reads simultaneously. Additionally, we implemented the serialization of index required for MEM retrieval to facilitate its reuse, resulting in accelerated startup for practical tasks. Extensive experiments demonstrate that our parallel algorithm achieves significant improvements in runtime, speedup, throughput, and memory usage. When applied to the largest human dataset, the algorithm achieves an impressive speedup of 10.78 × , significantly improving throughput on a large scale. Moreover, the integration of the parallel MEM retrieval algorithm into the uLTRA pipeline introduces a dual-layered parallel capability, consistently yielding a speedup of 4.99 × compared to the multi-process and single-threaded execution of uLTRA. The thorough analysis of experimental results underscores the adept utilization of parallel processing capabilities and its advantageous performance in handling large datasets. This study provides a showcase of parallelized strategies for MEM retrieval within the context of spliced alignment algorithm, effectively facilitating the process of RNA-seq data analysis. The code is available at https://github.com/RongxingWong/AcceleratingSplicedAlignment.

摘要

第三代测序技术的出现使基因组学领域发生了革命性的变化。随着测序数据的指数级增长，人们迫切需要准确、快速的算法来有效地处理这种不断增加的数据。在这种情况下，我们开发了一种并行的、无精度损失的最大精确匹配（MEM）检索算法，以战略性地解决 uLTRA 的计算瓶颈问题，uLTRA 是一种领先的拼接对齐算法，以其在处理长 RNA 测序（RNA-seq）reads 方面的精度而闻名。该算法的设计采用了多线程策略，能够同时处理多个reads。此外，我们实现了 MEM 检索所需的索引序列化，以方便其重用，从而加速实际任务的启动。广泛的实验表明，我们的并行算法在运行时间、加速比、吞吐量和内存使用方面都有显著的改进。当应用于最大的人类数据集时，该算法实现了令人印象深刻的 10.78 倍的加速，大大提高了大规模的吞吐量。此外，将并行 MEM 检索算法集成到 uLTRA 流水线中引入了双层并行能力，与 uLTRA 的多进程和单线程执行相比，始终能实现 4.99 倍的加速。实验结果的深入分析强调了并行处理能力的巧妙利用及其在处理大型数据集方面的优势性能。本研究展示了在拼接对齐算法中进行 MEM 检索的并行化策略，有效地促进了 RNA-seq 数据分析的进程。代码可在 https://github.com/RongxingWong/AcceleratingSplicedAlignment 获得。

相似文献

Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval.利用并行最大精确匹配检索加速长 RNA 测序reads 的拼接比对。

Comput Biol Med. 2024 Jun;175:108542. doi: 10.1016/j.compbiomed.2024.108542. Epub 2024 May 3.

ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe：用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。

Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.

Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.读取-分割-运行：一种利用RNA测序数据识别全基因组非经典剪接区域的改进型生物信息学流程。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.

smallWig: parallel compression of RNA-seq WIG files.smallWig：RNA序列WIG文件的并行压缩

Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.

Supersplat--spliced RNA-seq alignment.超拼接--拼接 RNA-seq 比对。

Bioinformatics. 2010 Jun 15;26(12):1500-5. doi: 10.1093/bioinformatics/btq206. Epub 2010 Apr 21.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

Fast noisy long read alignment with multi-level parallelism.基于多级并行的快速噪声长读比对

BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w.

HSA: a heuristic splice alignment tool.HSA：一种启发式剪接比对工具。

BMC Syst Biol. 2013;7 Suppl 2(Suppl 2):S10. doi: 10.1186/1752-0509-7-S2-S10. Epub 2013 Dec 17.

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework.使用 Apache Arrow 内存数据框架优化 GATK 工作流程的性能。

BMC Genomics. 2020 Nov 18;21(Suppl 10):683. doi: 10.1186/s12864-020-07013-y.

MinimapR: A parallel alignment tool for the analysis of large-scale third-generation sequencing data.MinimapR：一种用于分析大规模第三代测序数据的并行比对工具。

Comput Biol Chem. 2022 Aug;99:107735. doi: 10.1016/j.compbiolchem.2022.107735. Epub 2022 Jul 13.

引用本文的文献

Exploring vaginal microbiome: from traditional methods to metagenomic next-generation sequencing-a systematic review.探索阴道微生物群：从传统方法到宏基因组新一代测序——一项系统综述

Front Microbiol. 2025 Aug 14;16:1578681. doi: 10.3389/fmicb.2025.1578681. eCollection 2025.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用并行最大精确匹配检索加速长 RNA 测序reads 的拼接比对。

Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献