Suppr超能文献

西格莫尼:使用压缩全基因组索引对纳米孔信号进行分类。

Sigmoni: classification of nanopore signal with a compressed pangenome index.

作者信息

Shivakumar Vikram S, Ahmed Omar Y, Kovaka Sam, Zakeri Mohsen, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University.

出版信息

bioRxiv. 2023 Aug 30:2023.08.15.553308. doi: 10.1101/2023.08.15.553308.

Abstract

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the -index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes.

摘要

纳米孔测序技术的改进需要高效的分类方法,包括预过滤和自适应采样算法,以富集感兴趣的 reads。基于信号的方法规避了碱基识别的计算瓶颈。但是,过去基于信号的分类方法无法有效地扩展到像泛基因组这样的大型重复参考序列,限制了它们在部分参考序列或单个基因组中的应用。我们引入了 Sigmoni:一种基于 -索引的快速多类分类方法,可扩展到数百 Gbps 的参考序列。Sigmoni 将纳米孔信号量化为皮安范围的离散字母表。它使用匹配统计进行快速近似匹配,根据皮安匹配统计和共线性统计的分布对 reads 进行分类。在宿主耗尽实验中,Sigmoni 比以前的自适应采样方法快 10-100 倍,准确性更高,并且可以针对大型微生物或人类泛基因组查询 reads。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a0/10472758/d6a27fb32642/nihpp-2023.08.15.553308v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验