Suppr超能文献

最优间隙仿射对齐,时间复杂度为 O(s)。

Optimal gap-affine alignment in O(s) space.

机构信息

Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain.

Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona 08193, Spain.

出版信息

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad074.

Abstract

MOTIVATION

Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances in genomics and sequencing technologies demand faster and scalable algorithms that can cope with the ever-increasing sequence lengths. Classical pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory. The recently proposed wavefront alignment algorithm (WFA) introduced an efficient algorithm to perform exact gap-affine alignment in O(ns) time, where s is the optimal score and n is the sequence length. Notwithstanding these bounds, WFA's O(s2) memory requirements become computationally impractical for genome-scale alignments, leading to a need for further improvement.

RESULTS

In this article, we present the bidirectional WFA algorithm, the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA's time complexity of O(ns). As a result, this work improves the lowest known memory bound O(n) to compute gap-affine alignments. In practice, our implementation never requires more than a few hundred MBs aligning noisy Oxford Nanopore Technologies reads up to 1 Mbp long while maintaining competitive execution times.

AVAILABILITY AND IMPLEMENTATION

All code is publicly available at https://github.com/smarco/BiWFA-paper.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

成对序列比对仍然是计算生物学和生物信息学中的一个基本问题。基因组学和测序技术的最新进展要求更快和可扩展的算法,以应对不断增加的序列长度。基于动态规划的经典成对比对算法在时间和内存方面受到强烈的二次要求限制。最近提出的波前比对算法 (WFA) 引入了一种有效的算法,可以在 O(ns) 的时间内执行精确的带隙仿射比对,其中 s 是最佳得分,n 是序列长度。尽管有这些限制,WFA 的 O(s2) 内存要求对于基因组规模的比对在计算上是不切实际的,因此需要进一步改进。

结果

在本文中,我们提出了双向 WFA 算法,这是第一个能够在 O(s) 内存中计算最优比对的带隙仿射算法,同时保留 WFA 的 O(ns) 时间复杂度。因此,这项工作将计算带隙仿射比对的已知最低内存限制 O(n) 提高到了 O(s)。在实践中,我们的实现对齐长达 1 Mbp 的嘈杂牛津纳米孔技术读取物所需的内存从未超过几百兆字节,同时保持有竞争力的执行时间。

可用性和实现

所有代码都可在 https://github.com/smarco/BiWFA-paper 上公开获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/9940620/153bbe26d52a/btad074f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验