LAMSA：快速分裂读取比对算法，具有长近似匹配功能。

LAMSA: fast split read alignment with long approximate matches.

机构信息

Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.

出版信息

Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.

DOI:10.1093/bioinformatics/btw594

PMID:27667793

Abstract

MOTIVATION

Read length is continuously increasing with the development of novel high-throughput sequencing technologies, which has enormous potentials on cutting-edge genomic studies. However, longer reads could more frequently span the breakpoints of structural variants (SVs) than that of shorter reads. This may greatly influence read alignment, since most state-of-the-art aligners are designed for handling relatively small variants in a co-linear alignment framework. Meanwhile, long read alignment is still not as efficient as that of short reads, which could be also a bottleneck for the upcoming wide application.

RESULTS

We propose long approximate matches-based split aligner (LAMSA), a novel split read alignment approach. It takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming-based split alignment approach to handle the large or non-co-linear variants. We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; meanwhile, it also has good ability to handle various categories of SVs.

AVAILABILITY AND IMPLEMENTATION

LAMSA is available at https://github.com/hitbc/LAMSA CONTACT: Ydwang@hit.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

随着新型高通量测序技术的发展，读长不断增加，这对前沿基因组研究具有巨大潜力。然而，与较短的读长相比，较长的读长更有可能跨越结构变异 (SV) 的断点。这可能会极大地影响读对齐，因为大多数最先进的对齐器都是为处理相对较小的线性对齐框架中的变体而设计的。同时，长读对齐的效率不如短读对齐，这也可能成为即将广泛应用的瓶颈。

结果

我们提出了一种新的拆分读对齐方法——基于长近似匹配的拆分对齐器 (LAMSA)。它利用 SV 的稀有性来实现专门设计的两步策略。即，LAMSA 首先将读取分割成相对较长的片段，并将它们进行共线性对齐，以解决小的变化或测序错误，并减轻重复的影响。然后，使用片段的对齐来实现稀疏动态规划的拆分对齐方法来处理大的或非共线性变体。我们使用具有不同读长和测序错误率的模拟和真实数据集对 LAMSA 进行了基准测试，结果表明它明显快于最先进的长读对齐器；同时，它也具有很好的处理各种 SV 类别的能力。

可用性和实现

LAMSA 可在 https://github.com/hitbc/LAMSA 上获得。

联系人

Ydwang@hit.edu.cn

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

LAMSA: fast split read alignment with long approximate matches.

Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.

deBGA: read alignment with de Bruijn graph-based seed and extension.

Bioinformatics. 2016 Nov 1;32(21):3224-3232. doi: 10.1093/bioinformatics/btw371. Epub 2016 Jul 4.

rHAT: fast alignment of noisy long reads with regional hashing.

Bioinformatics. 2016 Jun 1;32(11):1625-31. doi: 10.1093/bioinformatics/btv662. Epub 2015 Nov 14.

Arioc: GPU-accelerated alignment of short bisulfite-treated reads.

Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167.

Evaluation of tools for long read RNA-seq splice-aware alignment.

Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

Minimap2: pairwise alignment for nucleotide sequences.

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Kart: a divide-and-conquer algorithm for NGS read alignment.

Bioinformatics. 2017 Aug 1;33(15):2281-2287. doi: 10.1093/bioinformatics/btx189.

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.

Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.

rMFilter: acceleration of long read-based structure variation calling by chimeric read filtering.

Bioinformatics. 2017 Sep 1;33(17):2750-2752. doi: 10.1093/bioinformatics/btx279.

RepLong: de novo repeat identification using long read sequencing data.

Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.

引用本文的文献

pathMap: a path-based mapping tool for long noisy reads with high sensitivity.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.

Benchmarking long-read genome sequence alignment tools for human genomics applications.

PeerJ. 2023 Dec 18;11:e16515. doi: 10.7717/peerj.16515. eCollection 2023.

invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.

Resolving complex structural variants via nanopore sequencing.

Front Genet. 2023 Aug 16;14:1213917. doi: 10.3389/fgene.2023.1213917. eCollection 2023.

A survey of mapping algorithms in the long-reads era.

Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.

kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the -Mer Neighborhood Graph.

Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022.

CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data.

Front Genet. 2021 Aug 16;12:700874. doi: 10.3389/fgene.2021.700874. eCollection 2021.

Technology dictates algorithms: recent developments in read alignment.

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA.

Front Cell Dev Biol. 2021 Apr 28;9:643645. doi: 10.3389/fcell.2021.643645. eCollection 2021.

smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.

BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

LAMSA：快速分裂读取比对算法，具有长近似匹配功能。

LAMSA: fast split read alignment with long approximate matches.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

联系人

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献