宏基因组学工具包：基于云的灵活高效宏基因组学工作流程，具有支持机器学习的资源分配功能。

Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation.

作者信息

Belmann Peter, Osterholz Benedikt, Kleinbölting Nils, Pühler Alfred, Schlüter Andreas, Sczyrba Alexander

机构信息

IBG-5: Computational Metagenomics, Institute of Bio- and Geosciences (IBG), Research Center Jülich GmbH, D-52428 Jülich, Germany.

Computational Metagenomics Group, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstrasse 25, D-33615 Bielefeld, Germany.

出版信息

NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf093. doi: 10.1093/nargab/lqaf093. eCollection 2025 Sep.

DOI:10.1093/nargab/lqaf093

PMID:40677915

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12267984/

Abstract

The metagenome analysis of complex environments with thousands of datasets, such as those in the Sequence Read Archive, requires substantial computational resources for it to be completed within a reasonable time frame. Efficient use of infrastructure is essential, and analyses must be fully reproducible with publicly available workflows to ensure transparency. Here, we introduce the Metagenomics-Toolkit, a scalable, data-agnostic workflow that automates the analysis of short and long metagenomic reads from Illumina and Oxford Nanopore Technology devices, respectively. The Metagenomics-Toolkit provides standard features such as quality control, assembly, binning, and annotation, along with unique capabilities including plasmid identification, recovery of unassembled microbial community members, and discovery of microbial interdependencies through dereplication, co-occurrence, and genome-scale metabolic modeling. Additionally, the Metagenomics-Toolkit includes a machine learning-optimized assembly step that adjusts peak RAM usage to match actual requirements, reducing the need for high-memory hardware. It can be executed on user workstations and includes optimizations for efficient cloud-based cluster execution. We compare the Metagenomics-Toolkit with five widely used metagenomics workflows and demonstrate its capabilities on 757 sewage metagenome datasets to investigate a possible sewage core microbiome. The Metagenomics-Toolkit is open source and available at https://github.com/metagenomics/metagenomics-tk.

摘要

对包含数千个数据集的复杂环境（如序列读取存档中的数据集）进行宏基因组分析，需要大量计算资源才能在合理的时间范围内完成。有效利用基础设施至关重要，并且分析必须通过公开可用的工作流程完全可重现，以确保透明度。在这里，我们介绍宏基因组学工具包，这是一种可扩展的、数据无关的工作流程，分别自动分析来自Illumina和牛津纳米孔技术设备的短和长宏基因组读数。宏基因组学工具包提供了诸如质量控制、组装、分箱和注释等标准功能，以及独特的功能，包括质粒鉴定、未组装微生物群落成员的恢复，以及通过去重复、共现和基因组规模代谢建模发现微生物相互依赖性。此外，宏基因组学工具包包括一个经过机器学习优化的组装步骤，该步骤可调整峰值RAM使用量以匹配实际需求，从而减少对高内存硬件的需求。它可以在用户工作站上执行，并包括针对基于云的集群高效执行的优化。我们将宏基因组学工具包与五个广泛使用的宏基因组学工作流程进行了比较，并在757个污水宏基因组数据集上展示了其功能，以研究可能的污水核心微生物组。宏基因组学工具包是开源的，可在https://github.com/metagenomics/metagenomics-tk上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fe5/12267984/844404e8c218/lqaf093fig1.jpg

相似文献

Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation.宏基因组学工具包：基于云的灵活高效宏基因组学工作流程，具有支持机器学习的资源分配功能。

NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf093. doi: 10.1093/nargab/lqaf093. eCollection 2025 Sep.

Short-Term Memory Impairment短期记忆障碍

Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理（2025年结石病专家共识）

Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

animalcules: interactive microbiome analytics and visualization in R.微生物组分析与可视化 R 语言工具包

Microbiome. 2021 Mar 28;9(1):76. doi: 10.1186/s40168-021-01013-0.

GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases.GRAPEVNE - 传染病图形分析管道开发环境

Wellcome Open Res. 2025 May 27;10:279. doi: 10.12688/wellcomeopenres.23824.1. eCollection 2025.

VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.VDJServer：一个基于云的免疫受体序列和重排分析门户和数据公共库。

Front Immunol. 2018 May 8;9:976. doi: 10.3389/fimmu.2018.00976. eCollection 2018.

Metagenome comparison (MC): A new framework for detecting unique/enriched OMUs (operational metagenomic units) derived from whole-genome sequencing reads.宏基因组比较 (MC)：一种从全基因组测序读段中检测独特/富集 OMUs（操作宏基因组单位）的新框架。

Comput Biol Med. 2024 Sep;180:108852. doi: 10.1016/j.compbiomed.2024.108852. Epub 2024 Aug 12.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Optimizing fungal DNA extraction and purification for Oxford Nanopore untargeted shotgun metagenomic sequencing from simulated hemoculture specimens.优化从模拟血液培养标本中进行牛津纳米孔非靶向鸟枪法宏基因组测序的真菌DNA提取和纯化方法。

mSystems. 2025 Jun 17;10(6):e0116624. doi: 10.1128/msystems.01166-24. Epub 2025 Apr 8.

An open-source nanopore-only sequencing workflow for analysis of clonal outbreaks delivers short-read level accuracy.一种用于分析克隆性暴发的仅基于纳米孔的开源测序工作流程可实现短读长水平的准确性。

J Clin Microbiol. 2025 Jul 18:e0066425. doi: 10.1128/jcm.00664-25.

本文引用的文献

A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources.土壤微生物组的基因组目录促进了生物多样性和遗传资源的挖掘。

Nat Commun. 2023 Nov 11;14(1):7318. doi: 10.1038/s41467-023-43000-z.

Global within-species phylogenetics of sewage microbes suggest that local adaptation shapes geographical bacterial clustering.污水微生物的全球种内系统发生学表明，局部适应塑造了地理细菌聚类。

Commun Biol. 2023 Jul 8;6(1):700. doi: 10.1038/s42003-023-05083-8.

NanoPack2: population-scale evaluation of long-read sequencing data.NanoPack2：长读测序数据的大规模评估。

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad311.

Author Correction: Genomic analysis of sewage from 101 countries reveals global landscape of antimicrobial resistance.作者更正：对来自101个国家的污水进行基因组分析揭示了全球抗菌药物耐药性的格局。

Nat Commun. 2023 Jan 12;14(1):178. doi: 10.1038/s41467-023-35890-w.

MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities.MetaBinner：一种高性能、独立的组装分类方法，可从复杂微生物群落中回收单个基因组。

Genome Biol. 2023 Jan 6;24(1):1. doi: 10.1186/s13059-022-02832-6.

Genomic analysis of sewage from 101 countries reveals global landscape of antimicrobial resistance.对 101 个国家污水的基因组分析揭示了全球抗微生物药物耐药性的状况。

Nat Commun. 2022 Dec 1;13(1):7251. doi: 10.1038/s41467-022-34312-7.

Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs.基于组成、覆盖度和组装图对宏基因组序列进行精确分箱。

J Comput Biol. 2022 Dec;29(12):1357-1376. doi: 10.1089/cmb.2022.0262. Epub 2022 Nov 11.

MAGScoT: a fast, lightweight and accurate bin-refinement tool.MAGScoT：一款快速、轻量级且精确的 bin 细化工具。

Bioinformatics. 2022 Dec 13;38(24):5430-5433. doi: 10.1093/bioinformatics/btac694.

GTDB-Tk v2: memory friendly classification with the genome taxonomy database.GTDB-Tk v2：使用基因组分类数据库实现内存友好的分类。

Bioinformatics. 2022 Nov 30;38(23):5315-5316. doi: 10.1093/bioinformatics/btac672.

Biosynthetic potential of the global ocean microbiome.全球海洋微生物组的生物合成潜力。

Nature. 2022 Jul;607(7917):111-118. doi: 10.1038/s41586-022-04862-3. Epub 2022 Jun 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

宏基因组学工具包：基于云的灵活高效宏基因组学工作流程，具有支持机器学习的资源分配功能。

Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献