Suppr超能文献

LAMSA:快速分裂读取比对算法,具有长近似匹配功能。

LAMSA: fast split read alignment with long approximate matches.

机构信息

Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.

出版信息

Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25.

Abstract

MOTIVATION

Read length is continuously increasing with the development of novel high-throughput sequencing technologies, which has enormous potentials on cutting-edge genomic studies. However, longer reads could more frequently span the breakpoints of structural variants (SVs) than that of shorter reads. This may greatly influence read alignment, since most state-of-the-art aligners are designed for handling relatively small variants in a co-linear alignment framework. Meanwhile, long read alignment is still not as efficient as that of short reads, which could be also a bottleneck for the upcoming wide application.

RESULTS

We propose long approximate matches-based split aligner (LAMSA), a novel split read alignment approach. It takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming-based split alignment approach to handle the large or non-co-linear variants. We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; meanwhile, it also has good ability to handle various categories of SVs.

AVAILABILITY AND IMPLEMENTATION

LAMSA is available at https://github.com/hitbc/LAMSA CONTACT: Ydwang@hit.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

随着新型高通量测序技术的发展,读长不断增加,这对前沿基因组研究具有巨大潜力。然而,与较短的读长相比,较长的读长更有可能跨越结构变异 (SV) 的断点。这可能会极大地影响读对齐,因为大多数最先进的对齐器都是为处理相对较小的线性对齐框架中的变体而设计的。同时,长读对齐的效率不如短读对齐,这也可能成为即将广泛应用的瓶颈。

结果

我们提出了一种新的拆分读对齐方法——基于长近似匹配的拆分对齐器 (LAMSA)。它利用 SV 的稀有性来实现专门设计的两步策略。即,LAMSA 首先将读取分割成相对较长的片段,并将它们进行共线性对齐,以解决小的变化或测序错误,并减轻重复的影响。然后,使用片段的对齐来实现稀疏动态规划的拆分对齐方法来处理大的或非共线性变体。我们使用具有不同读长和测序错误率的模拟和真实数据集对 LAMSA 进行了基准测试,结果表明它明显快于最先进的长读对齐器;同时,它也具有很好的处理各种 SV 类别的能力。

可用性和实现

LAMSA 可在 https://github.com/hitbc/LAMSA 上获得。

联系人

Ydwang@hit.edu.cn

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验