Suppr超能文献

Fairy:用于多样本宏基因组 bin 快速近似覆盖的方法。

Fairy: fast approximate coverage for multi-sample metagenomic binning.

机构信息

Department of Mathematics, University of Toronto, Toronto, Canada.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA.

出版信息

Microbiome. 2024 Aug 14;12(1):151. doi: 10.1186/s40168-024-01861-6.

Abstract

BACKGROUND

Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.

RESULTS

We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers of MAGs with completeness and contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( more complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.

CONCLUSIONS

Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.

摘要

背景

宏基因组 binning 是将属于同一基因组的组装 contigs 聚类的过程,是回收宏基因组组装基因组(MAG)的关键步骤。通过利用基因组上一致的特征(如读覆盖模式)来连接 contigs。使用多个样本的覆盖度可以生成更高质量的 MAG;然而,标准的流水线需要对多个样本进行全对全的读比对,以计算覆盖度,这成为了一个关键的计算瓶颈。

结果

我们提出了 fairy(https://github.com/bluenote-1577/fairy),这是一种用于宏基因组 binning 的近似覆盖度计算方法。Fairy 是一种快速的基于 k-mer 的无比对方法。对于多样本 binning,fairy 可以比读比对更快,并且对于 binning 来说足够准确。Fairy 与几个现有的宿主和非宿主相关数据集的 binner 兼容。使用 MetaBAT2,fairy 相对于 BWA 的比对,能够恢复到 的 MAG,完整性为 ,污染度为 。值得注意的是,使用 fairy 的多样本 binning 总是优于使用 BWA 的单样本 binning(平均多 个更完整的 MAG),同时速度仍然更快。对于一个公开的沉积物宏基因组项目,我们证明了多样本 binning 可以比单样本 binning 恢复更高质量的 Asgard 古菌 MAG,并且 fairy 的结果与读比对无法区分。

结论

Fairy 是一种新的工具,可以快速近似计算 binning 的多样本覆盖度,解决了宏基因组学中的一个计算瓶颈。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ffd/11323348/f11c7c164ded/40168_2024_1861_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验