Fairy：用于多样本宏基因组 bin 快速近似覆盖的方法。

Fairy: fast approximate coverage for multi-sample metagenomic binning.

机构信息

Department of Mathematics, University of Toronto, Toronto, Canada.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA.

出版信息

Microbiome. 2024 Aug 14;12(1):151. doi: 10.1186/s40168-024-01861-6.

DOI:10.1186/s40168-024-01861-6

PMID:39143609

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11323348/

Abstract

BACKGROUND

Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.

RESULTS

We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers of MAGs with completeness and contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( more complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.

CONCLUSIONS

Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.

摘要

背景

宏基因组 binning 是将属于同一基因组的组装 contigs 聚类的过程，是回收宏基因组组装基因组（MAG）的关键步骤。通过利用基因组上一致的特征（如读覆盖模式）来连接 contigs。使用多个样本的覆盖度可以生成更高质量的 MAG；然而，标准的流水线需要对多个样本进行全对全的读比对，以计算覆盖度，这成为了一个关键的计算瓶颈。

结果

我们提出了 fairy（https://github.com/bluenote-1577/fairy），这是一种用于宏基因组 binning 的近似覆盖度计算方法。Fairy 是一种快速的基于 k-mer 的无比对方法。对于多样本 binning，fairy 可以比读比对更快，并且对于 binning 来说足够准确。Fairy 与几个现有的宿主和非宿主相关数据集的 binner 兼容。使用 MetaBAT2，fairy 相对于 BWA 的比对，能够恢复到的 MAG，完整性为，污染度为。值得注意的是，使用 fairy 的多样本 binning 总是优于使用 BWA 的单样本 binning（平均多个更完整的 MAG），同时速度仍然更快。对于一个公开的沉积物宏基因组项目，我们证明了多样本 binning 可以比单样本 binning 恢复更高质量的 Asgard 古菌 MAG，并且 fairy 的结果与读比对无法区分。

结论

Fairy 是一种新的工具，可以快速近似计算 binning 的多样本覆盖度，解决了宏基因组学中的一个计算瓶颈。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ffd/11323348/f11c7c164ded/40168_2024_1861_Fig1_HTML.jpg

相似文献

Fairy: fast approximate coverage for multi-sample metagenomic binning. Fairy：用于多样本宏基因组 bin 快速近似覆盖的方法。

Microbiome. 2024 Aug 14;12(1):151. doi: 10.1186/s40168-024-01861-6.

Evaluating Assembly and Binning Strategies for Time Series Drinking Water Metagenomes.评估时间序列饮用水宏基因组的组装和分类策略。

Microbiol Spectr. 2021 Dec 22;9(3):e0143421. doi: 10.1128/Spectrum.01434-21. Epub 2021 Nov 3.

CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.CoMet：一种使用 contig 覆盖度和组成进行宏基因组样本高精度分箱的工作流程。

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):571. doi: 10.1186/s12859-017-1967-3.

Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.评估宏基因组工具在真实宏基因组数据集和 CAMI 数据集上的基因组 binning 效果。

BMC Bioinformatics. 2020 Jul 28;21(1):334. doi: 10.1186/s12859-020-03667-3.

HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps.HiCBin：使用 Hi-C 接触图谱对宏基因组 contigs 进行 binning 和恢复宏基因组组装基因组。

Genome Biol. 2022 Feb 28;23(1):63. doi: 10.1186/s13059-022-02626-w.

binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets.binny：一种自动化的分箱算法，可从复杂的宏基因组数据集中恢复高质量的基因组。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac431.

A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination.单覆盖度与多覆盖度宏基因组 bin 划分比较揭示了广泛存在的隐藏污染。

Nat Methods. 2023 Aug;20(8):1170-1173. doi: 10.1038/s41592-023-01934-8. Epub 2023 Jun 29.

GraphBin: refined binning of metagenomic contigs using assembly graphs.GraphBin：使用组装图对宏基因组序列进行精细化分箱。

Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.

Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs.基于组成、覆盖度和组装图对宏基因组序列进行精确分箱。

J Comput Biol. 2022 Dec;29(12):1357-1376. doi: 10.1089/cmb.2022.0262. Epub 2022 Nov 11.

CH-Bin: A convex hull based approach for binning metagenomic contigs.CH-Bin：一种基于凸壳的宏基因组 contigs 分箱方法。

Comput Biol Chem. 2022 Oct;100:107734. doi: 10.1016/j.compbiolchem.2022.107734. Epub 2022 Jul 14.

引用本文的文献

Targeted Intervention Strategies for Maternal-Offspring Transmission of Christensenellaceae in Pigs via a Deep Learning Model.基于深度学习模型的猪中克里斯滕森菌科母婴传播的靶向干预策略

Adv Sci (Weinh). 2025 Aug;12(31):e03411. doi: 10.1002/advs.202503411. Epub 2025 Jun 10.

CoverM: read alignment statistics for metagenomics.CoverM：宏基因组学的读取比对统计信息。

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf147.

本文引用的文献

High-quality metagenome assembly from long accurate reads with metaMDBG.使用 metaMDBG 从长而准确的读取中进行高质量的宏基因组组装。

Nat Biotechnol. 2024 Sep;42(9):1378-1383. doi: 10.1038/s41587-023-01983-6. Epub 2024 Jan 2.

Fast and robust metagenomic sequence comparison through sparse chaining with skani.通过使用 skani 进行稀疏链接实现快速稳健的宏基因组序列比较。

Nat Methods. 2023 Nov;20(11):1661-1665. doi: 10.1038/s41592-023-02018-3. Epub 2023 Sep 21.

BinaRena: a dedicated interactive platform for human-guided exploration and binning of metagenomes.BinaRena：一个专门用于人类引导的探索和宏基因组分箱的交互式平台。

Microbiome. 2023 Aug 19;11(1):186. doi: 10.1186/s40168-023-01625-8.

CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning.CheckM2：一种使用机器学习快速、可扩展且准确评估微生物基因组质量的工具。

Nat Methods. 2023 Aug;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. Epub 2023 Jul 27.

SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing.半Bin2：自监督对比学习可提高短读长读测序的宏基因组组装质量。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i21-i29. doi: 10.1093/bioinformatics/btad209.

Nat Methods. 2023 Aug;20(8):1170-1173. doi: 10.1038/s41592-023-01934-8. Epub 2023 Jun 29.

Inference and reconstruction of the heimdallarchaeial ancestry of eukaryotes.真核生物 Heimdallarchaeia 祖先的推断和重建。

Nature. 2023 Jun;618(7967):992-999. doi: 10.1038/s41586-023-06186-2. Epub 2023 Jun 14.

Dissolved storage glycans shaped the community composition of abundant bacterioplankton clades during a North Sea spring phytoplankton bloom.溶解态储存糖塑造了北海春季浮游植物爆发期间丰度较高的浮游细菌类群的群落组成。

Microbiome. 2023 Apr 17;11(1):77. doi: 10.1186/s40168-023-01517-x.

Closed genomes uncover a saltwater species of Candidatus Electronema and shed new light on the boundary between marine and freshwater cable bacteria.闭合基因组揭示了一种海生的“Candidatus Electronema”，并为海洋和淡水电缆菌之间的界限提供了新的认识。

ISME J. 2023 Apr;17(4):561-569. doi: 10.1038/s41396-023-01372-6. Epub 2023 Jan 25.

MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities.MetaBinner：一种高性能、独立的组装分类方法，可从复杂微生物群落中回收单个基因组。

Genome Biol. 2023 Jan 6;24(1):1. doi: 10.1186/s13059-022-02832-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Fairy：用于多样本宏基因组 bin 快速近似覆盖的方法。

Fairy: fast approximate coverage for multi-sample metagenomic binning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献