Suppr超能文献

使用MetaSBT和分类学感知序列布隆树对大规模微生物暗物质进行表征。

Characterization of microbial dark matter at scale with MetaSBT and taxonomy-aware Sequence Bloom Trees.

作者信息

Cumbo Fabio, Blankenberg Daniel

出版信息

bioRxiv. 2025 Aug 30:2025.08.25.672238. doi: 10.1101/2025.08.25.672238.

Abstract

UNLABELLED

Metagenomics has become a powerful tool for studying microbial communities, allowing researchers to investigate microbial diversity within complex environmental samples. Recent advances in sequencing technology have enabled the recovery of near-complete microbial genomes directly from metagenomic samples, also known as metagenome-assembled genomes (MAGs). However, accurately characterizing these genomes remains a significant challenge due to the presence of sequencing errors, incomplete assembly, and contamination. Here we present MetaSBT, a new tool for organizing, indexing, and characterizing microbial reference genomes and MAGs. It is able to identify clusters of genomes at all seven taxonomic levels, from the kingdom all the way down to the species level, using the Sequence Bloom Tree (SBT) data structure that relies on Bloom Filters (BFs) to index massive amounts of genomes based on their k-mers composition. We have built an initial set of databases composed of over 190 thousand viral genomes from NCBI GenBank and public sources grouped into sequence consistent clusters at different taxonomic levels, making it the first software solution for the classification of viruses at different ranks, including still unknown ones. This results in the definition of over 40 thousand species clusters where ∼80% do not match with any known viral species in reference databases to date. Furthermore, we show how our databases can be used as a new basis for existing quantitative metagenomic profilers to unlock the detection of unknown microbes and the estimation of their abundance in metagenomic samples. Finally, the framework is released open-source and, along with its public databases, is fully integrated into the Galaxy Platform enabling broad accessibility.

IMPORTANCE

The MetaSBT framework and its databases, together with its integration in the Galaxy Platform, provide a powerful resource for microbial research. MetaSBT provides a powerful and scalable approach for classifying microbial genomes, including previously unknown ones. This facilitates the discovery and characterization of novel taxa, a crucial feature for expanding our knowledge of microbial diversity and its implications within host health and environmental factors. Furthermore, MetaSBT databases can serve as a reference base for other state-of-the-art tools, enhancing their capabilities to identify, analyze, and classify unknown microbes in metagenomic samples.

摘要

未标注

宏基因组学已成为研究微生物群落的强大工具,使研究人员能够调查复杂环境样本中的微生物多样性。测序技术的最新进展使得能够直接从宏基因组样本中获得近乎完整的微生物基因组,即宏基因组组装基因组(MAGs)。然而,由于存在测序错误、组装不完整和污染等问题,准确表征这些基因组仍然是一项重大挑战。在此,我们展示了MetaSBT,这是一种用于组织、索引和表征微生物参考基因组及MAGs的新工具。它能够使用基于布隆过滤器(BFs)的序列布隆树(SBT)数据结构,在从界到种的所有七个分类水平上识别基因组簇,该数据结构基于k-mer组成对大量基因组进行索引。我们构建了一组初始数据库,其中包含来自NCBI GenBank和公共来源的超过19万个病毒基因组,这些基因组在不同分类水平上被分组为序列一致的簇,使其成为第一个针对不同等级病毒分类的软件解决方案,包括尚未知晓的病毒。这导致定义了超过4万个物种簇,其中约80%与目前参考数据库中任何已知病毒物种均不匹配。此外,我们展示了我们的数据库如何能够作为现有定量宏基因组分析工具的新基础,以实现对未知微生物的检测及其在宏基因组样本中丰度的估计。最后,该框架以开源形式发布,并且连同其公共数据库一起,完全集成到Galaxy平台中,实现了广泛的可访问性。

重要性

MetaSBT框架及其数据库,连同其在Galaxy平台中的集成,为微生物研究提供了强大的资源。MetaSBT为分类微生物基因组(包括以前未知的基因组)提供了一种强大且可扩展的方法。这有助于发现和表征新的分类单元,这是扩展我们对微生物多样性及其在宿主健康和环境因素中的影响的认识的关键特征。此外,MetaSBT数据库可以作为其他先进工具的参考基础,增强它们在宏基因组样本中识别、分析和分类未知微生物的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b63f/12407952/f2a7e1e184b8/nihpp-2025.08.25.672238v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验