Suppr超能文献

Centrifuger:用于高效准确的宏基因组序列分类的微生物基因组无损压缩

Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification.

作者信息

Song Li, Langmead Ben

机构信息

Department of Biomedical Data Science, Dartmouth College, Hanover, NH.

Department of Computer Science, Johns Hopkins University, Baltimore, MD.

出版信息

bioRxiv. 2023 Nov 17:2023.11.15.567129. doi: 10.1101/2023.11.15.567129.

Abstract

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.

摘要

Centrifuger是一种高效的分类学分类方法,它将测序读数与微生物基因组数据库进行比较。在Centrifuger中,使用一种称为运行块压缩的新方案对Burrows-Wheeler变换后的基因组序列进行无损压缩。运行块压缩实现了亚线性空间复杂度,并且在压缩像RefSeq这样的各种微生物数据库时有效,同时支持快速排名查询。将这种压缩方法与其他用于压缩Ferragina-Manzini(FM)索引的策略相结合,与其他基于FM索引的方法相比,Centrifuger将内存占用减少了一半。此外,无损压缩和无限制的匹配长度有助于Centrifuger在较低分类级别上比竞争方法实现更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4352/10680779/febd66f8424a/nihpp-2023.11.15.567129v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验