• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多级并行的快速噪声长读比对

Fast noisy long read alignment with multi-level parallelism.

作者信息

Xia Zeyu, Yang Canqun, Peng Chenchen, Guo Yifei, Guo Yufei, Tang Tao, Cui Yingbo

机构信息

College of Computer Science and Technology, National University of Defense Technology, 410073, Changsha, China.

National Supercomputer Center in Tianjin, 300457, Tianjin, China.

出版信息

BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w.

DOI:10.1186/s12859-025-06129-w
PMID:40316905
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12049014/
Abstract

BACKGROUND

The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU's performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing.

RESULTS

To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node.

CONCLUSIONS

Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.

摘要

背景

单分子实时(SMRT)测序技术的出现克服了第二代测序技术的许多局限性,如读长有限、PCR扩增偏差等。然而,更长的读长会使数据量呈指数级增长,且错误率高使得许多现有的比对工具无法适用。此外,单个CPU的性能瓶颈限制了针对SMRT测序的比对算法的有效性。

结果

为应对这些挑战,我们引入了ParaHAT,一种用于有噪声长读段的并行比对算法。ParaHAT利用向量级、线程级、进程级和异构并行性。我们重新设计了动态规划矩阵布局,以消除碱基级比对中的数据依赖性,实现有效的向量化。我们通过异构并行技术进一步提高计算速度,并使用MPI实现了多节点计算算法,克服了单个节点的计算限制。

结论

性能评估表明,ParaHAT在碱基级比对中实现了10.03倍的加速,在128个节点上的并行加速比和弱可扩展性指标分别为94.61和98.98%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/8444788f9ab8/12859_2025_6129_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/d01513e426d6/12859_2025_6129_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/d979ff25b78e/12859_2025_6129_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/b86521e44fd3/12859_2025_6129_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/6f6c7cb4e4ad/12859_2025_6129_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/01fde2542065/12859_2025_6129_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/cf6348aa12cc/12859_2025_6129_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/dc01e1369626/12859_2025_6129_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/f69c651f9245/12859_2025_6129_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/a46d0f6e27d0/12859_2025_6129_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/589611d9e714/12859_2025_6129_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/8444788f9ab8/12859_2025_6129_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/d01513e426d6/12859_2025_6129_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/d979ff25b78e/12859_2025_6129_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/b86521e44fd3/12859_2025_6129_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/6f6c7cb4e4ad/12859_2025_6129_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/01fde2542065/12859_2025_6129_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/cf6348aa12cc/12859_2025_6129_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/dc01e1369626/12859_2025_6129_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/f69c651f9245/12859_2025_6129_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/a46d0f6e27d0/12859_2025_6129_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/589611d9e714/12859_2025_6129_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81b2/12049014/8444788f9ab8/12859_2025_6129_Fig11_HTML.jpg

相似文献

1
Fast noisy long read alignment with multi-level parallelism.基于多级并行的快速噪声长读比对
BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w.
2
HISEA: HIerarchical SEed Aligner for PacBio data.HISEA:用于PacBio数据的分层种子比对器。
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.
3
MinimapR: A parallel alignment tool for the analysis of large-scale third-generation sequencing data.MinimapR:一种用于分析大规模第三代测序数据的并行比对工具。
Comput Biol Chem. 2022 Aug;99:107735. doi: 10.1016/j.compbiolchem.2022.107735. Epub 2022 Jul 13.
4
Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval.利用并行最大精确匹配检索加速长 RNA 测序reads 的拼接比对。
Comput Biol Med. 2024 Jun;175:108542. doi: 10.1016/j.compbiomed.2024.108542. Epub 2024 May 3.
5
Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.使用UPC+在多核集群上实现并行和可扩展的短读比对
PLoS One. 2016 Jan 5;11(1):e0145490. doi: 10.1371/journal.pone.0145490. eCollection 2016.
6
RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.RAMICS:高通量测序读数与编码DNA的可训练、高速且生物学相关的比对
Nucleic Acids Res. 2014 Jul;42(13):e106. doi: 10.1093/nar/gku473. Epub 2014 May 26.
7
S-conLSH: alignment-free gapped mapping of noisy long reads.S-conLSH:无比对的含噪长读段映射
BMC Bioinformatics. 2021 Feb 11;22(1):64. doi: 10.1186/s12859-020-03918-3.
8
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.GateKeeper:一种用于加速 DNA 短读映射预对齐的新硬件架构。
Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.
9
Faster single-end alignment generation utilizing multi-thread for BWA.利用多线程实现更快的BWA单端比对生成。
Biomed Mater Eng. 2015;26 Suppl 1:S1791-6. doi: 10.3233/BME-151480.
10
RandAL: a randomized approach to aligning DNA sequences to reference genomes.RandAL:一种将DNA序列与参考基因组进行比对的随机方法。
BMC Genomics. 2014;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2164-15-S5-S2. Epub 2014 Jul 14.

本文引用的文献

1
WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU:基于 GPU 的缺口仿射两两序列比对
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.
2
Accelerating BWA-MEM Read Mapping on GPUs.在图形处理器上加速BWA-MEM读段比对
ICS. 2023 Jun;2023:155-166. doi: 10.1145/3577193.3593703. Epub 2023 Jun 21.
3
Optimal gap-affine alignment in O(s) space.最优间隙仿射对齐,时间复杂度为 O(s)。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad074.
4
A Review of Parallel Implementations for the Smith-Waterman Algorithm.《Smith-Waterman 算法的并行实现综述》。
Interdiscip Sci. 2022 Mar;14(1):1-14. doi: 10.1007/s12539-021-00473-0. Epub 2021 Sep 6.
5
Accel-Align: a fast sequence mapper and aligner based on the seed-embed-extend method.Accel-Align:一种基于种子嵌入扩展方法的快速序列映射和比对工具。
BMC Bioinformatics. 2021 May 20;22(1):257. doi: 10.1186/s12859-021-04162-z.
6
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
7
Evolution of biosequence search algorithms: a brief survey.生物序列搜索算法的发展历程:简要综述。
Bioinformatics. 2019 Oct 1;35(19):3547-3552. doi: 10.1093/bioinformatics/btz272.
8
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.使用矢量化和多线程在 SeqAn 中进行通用加速序列比对。
Bioinformatics. 2018 Oct 15;34(20):3437-3445. doi: 10.1093/bioinformatics/bty380.
9
Nanopore sequencing and assembly of a human genome with ultra-long reads.纳米孔测序和超长读长组装人类基因组。
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.
10
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.MSAProbs-MPI:用于分布式内存系统的并行多序列比对工具
Bioinformatics. 2016 Dec 15;32(24):3826-3828. doi: 10.1093/bioinformatics/btw558. Epub 2016 Sep 16.