Suppr超能文献

Movi:一种快速且缓存高效的全基因组索引。

Movi: A fast and cache-efficient full-text pangenome index.

作者信息

Zakeri Mohsen, Brown Nathaniel K, Ahmed Omar Y, Gagie Travis, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, US.

Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada.

出版信息

iScience. 2024 Nov 27;27(12):111464. doi: 10.1016/j.isci.2024.111464. eCollection 2024 Dec 20.

Abstract

Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT runs (consecutive sequence of identical characters). We developed Movi based on the move structure for indexing and querying pangenomes. Movi scales very well for repetitive text as its size grows strictly by r. Movi computes sophisticated matching queries for classification such as pseudo-matching lengths and backward search up to 30 times faster than existing methods by minimizing the number of cache misses and using memory prefetching to attain a degree of latency hiding. Movi's fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.

摘要

泛基因组索引是适用于许多应用的有前景的工具,包括对纳米孔测序读数进行分类。移动结构是一种基于Burrows-Wheeler变换(BWT)的压缩索引数据结构。它提供了同时的O(1)时间查询和O(r)空间,其中r是BWT游程(相同字符的连续序列)的数量。我们基于移动结构开发了Movi,用于对泛基因组进行索引和查询。由于Movi的大小严格按r增长,因此它对于重复文本具有很好的扩展性。Movi通过最小化缓存未命中的数量并使用内存预取来实现一定程度的延迟隐藏,从而计算复杂的匹配查询以进行分类,例如伪匹配长度和反向搜索,速度比现有方法快30倍。Movi快速的常数时间查询循环使其非常适合实时应用,如纳米孔测序的自适应采样,在这种应用中必须在小且可预测的时间间隔内做出决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae1/11696632/5940f8658d16/fx1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验