Suppr
超能文献

美国能源部联合基因组研究所宏基因组工作流程

DOE JGI Metagenome Workflow.

作者信息

Clum Alicia, Huntemann Marcel, Bushnell Brian, Foster Brian, Foster Bryce, Roux Simon, Hajek Patrick P, Varghese Neha, Mukherjee Supratim, Reddy T B K, Daum Chris, Yoshinaga Yuko, O'Malley Ronan, Seshadri Rekha, Kyrpides Nikos C, Eloe-Fadrosh Emiley A, Chen I-Min A, Copeland Alex, Ivanova Natalia N

机构信息

Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA

Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

出版信息

mSystems. 2021 May 18;6(3):e00804-20. doi: 10.1128/mSystems.00804-20.

DOI:10.1128/mSystems.00804-20

PMID:34006627

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8269246/

Abstract

The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751-D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723-D733, 2021, https://doi.org/10.1093/nar/gkaa983). The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner.

摘要

美国能源部联合基因组研究所（JGI）宏基因组工作流程可进行宏基因组数据处理，包括组装、结构注释、功能注释、分类注释以及对宏基因组数据集进行分箱，随后这些数据集会被纳入综合微生物基因组与微生物群落（IMG/M）（I.-M. A. 陈、K. 朱、K. 帕拉尼亚潘、A. 拉特纳等人，《核酸研究》，49:D751 - D763，2021，https://doi.org/10.1093/nar/gkaa939）比较分析系统，并可通过JGI数据门户（https://genome.jgi.doe.gov/portal/）进行下载。此工作流程能够扩展以每年运行数千个宏基因组样本，样本数量会因微生物群落的复杂性和测序深度而有所不同。在此，我们描述了工作流程不同步骤中使用的不同工具、数据库和参数，以帮助解读IMG中可用的宏基因组数据，并使研究人员能够将此工作流程应用于他们自己的数据。我们使用20个公开可用的沉积物宏基因组来说明不同步骤的计算需求，并突出数据处理的典型结果。用于读取过滤和宏基因组组装的工作流程模块以工作流描述语言（WDL）文件（https://code.jgi.doe.gov/BFoster/jgi_meta_wdl）的形式提供。用于注释和分箱的工作流程模块在https://img.jgi.doe.gov/submit作为一项服务提供给用户群体，并且需要在基因组在线数据库（GOLD）（S. 慕克吉、D. 斯塔马蒂斯、J. 贝奇、G. 奥夫钦尼科娃等人，《核酸研究》，49:D723 - D733，2021，https://doi.org/10.1093/nar/gkaa983）中填写项目及相关元数据描述。美国能源部JGI宏基因组工作流程旨在从Illumina fastq文件开始处理宏基因组数据集。它执行数据预处理、纠错、组装、结构和功能注释以及分箱。处理结果以多种标准格式提供，如fasta和gff，可用于随后整合到综合微生物基因组与微生物群落（IMG/M）系统中，在那里可与一组全面的公开可用宏基因组进行比较。截至2020年7月30日，美国能源部JGI宏基因组工作流程已处理了7155个JGI宏基因组。在此，我们展示了JGI开发的一个宏基因组工作流程，该流程以标准格式生成丰富的数据，并针对从评估微生物群落的功能和分类组成到基因组解析宏基因组学以及新分类单元的鉴定和表征等下游分析进行了优化。此工作流程目前正用于以一致且标准化的方式分析数千个宏基因组数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2918/8269246/8d37d185acd1/msystems.00804-20-f001.jpg

相似文献

DOE JGI Metagenome Workflow.

mSystems. 2021 May 18;6(3):e00804-20. doi: 10.1128/mSystems.00804-20.

The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4).

Stand Genomic Sci. 2016 Feb 24;11:17. doi: 10.1186/s40793-016-0138-x. eCollection 2016.

IMG/M 4 version of the integrated metagenome comparative analysis system.

Nucleic Acids Res. 2014 Jan;42(Database issue):D568-73. doi: 10.1093/nar/gkt919. Epub 2013 Oct 16.

IMG/M: the integrated metagenome data management and comparative analysis system.

Nucleic Acids Res. 2012 Jan;40(Database issue):D123-9. doi: 10.1093/nar/gkr975. Epub 2011 Nov 15.

IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes.

Nucleic Acids Res. 2019 Jan 8;47(D1):D666-D677. doi: 10.1093/nar/gky901.

IMG/M: integrated genome and metagenome comparative data analysis system.

Nucleic Acids Res. 2017 Jan 4;45(D1):D507-D516. doi: 10.1093/nar/gkw929. Epub 2016 Oct 13.

The DOE-JGI Standard Operating Procedure for the Annotations of Microbial Genomes.

Stand Genomic Sci. 2009 Jul 20;1(1):63-7. doi: 10.4056/sigs.632.

The IMG/M data management and analysis system v.7: content updates and new features.

Nucleic Acids Res. 2023 Jan 6;51(D1):D723-D732. doi: 10.1093/nar/gkac976.

IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata.

Nucleic Acids Res. 2024 Jan 5;52(D1):D164-D173. doi: 10.1093/nar/gkad964.

IMG/M: a data management and analysis system for metagenomes.

Nucleic Acids Res. 2008 Jan;36(Database issue):D534-8. doi: 10.1093/nar/gkm869. Epub 2007 Oct 11.

引用本文的文献

Adaptive pangenomic remodeling in the Azolla cyanobiont amid a transient microbiome.

ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf154.

Suppression of gut colonization by multidrug-resistant Escherichia coli clinical isolates through cooperative niche exclusion.

Nat Commun. 2025 Jul 1;16(1):5426. doi: 10.1038/s41467-025-61327-7.

Laboratory mice engrafted with natural gut microbiota possess a wildling-like phenotype.

Nat Commun. 2025 Jun 12;16(1):5301. doi: 10.1038/s41467-025-60554-2.

Bacterial and fungal composition and exometabolites control the development and persistence of soil water repellency.

ISME Commun. 2025 May 20;5(1):ycaf084. doi: 10.1093/ismeco/ycaf084. eCollection 2025 Jan.

Climate-driven succession in marine microbiome biodiversity and biogeochemical function.

Nat Commun. 2025 Apr 25;16(1):3926. doi: 10.1038/s41467-025-59382-1.

Global Archaeal Diversity Revealed Through Massive Data Integration: Uncovering Just Tip of Iceberg.

Microorganisms. 2025 Mar 5;13(3):598. doi: 10.3390/microorganisms13030598.

Size-fractionated metagenomic depth profiles from two sulfidic stations in the Chesapeake Bay.

Microbiol Resour Announc. 2025 Apr 10;14(4):e0008425. doi: 10.1128/mra.00084-25. Epub 2025 Mar 25.

Bacterial response to the 2021 Orange County, California, oil spill was episodic but subtle relative to natural fluctuations.

Microbiol Spectr. 2025 Mar 14;13(5):e0226724. doi: 10.1128/spectrum.02267-24.

Laminarin stimulates single cell rates of sulfate reduction whereas oxygen inhibits transcriptomic activity in coastal marine sediment.

ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf042.

Environmental matrix and moisture influence soil microbial phenotypes in a simplified porous media incubation.

mSystems. 2025 Mar 18;10(3):e0161624. doi: 10.1128/msystems.01616-24. Epub 2025 Feb 24.

本文引用的文献

Shotgun metagenomic analysis of microbial communities from the Loxahatchee nature preserve in the Florida Everglades.

Environ Microbiome. 2020 Jan 21;15(1):2. doi: 10.1186/s40793-019-0352-4.

Genomes OnLine Database (GOLD) v.8: overview and updates.

Nucleic Acids Res. 2021 Jan 8;49(D1):D723-D733. doi: 10.1093/nar/gkaa983.

The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities.

Nucleic Acids Res. 2021 Jan 8;49(D1):D751-D763. doi: 10.1093/nar/gkaa939.

Improved metagenomic analysis with Kraken 2.

Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database.

Bioinformatics. 2019 Nov 15;36(6):1925-7. doi: 10.1093/bioinformatics/btz848.

MGnify: the microbiome analysis resource in 2020.

Nucleic Acids Res. 2020 Jan 8;48(D1):D570-D578. doi: 10.1093/nar/gkz1035.

tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences.

Methods Mol Biol. 2019;1962:1-14. doi: 10.1007/978-1-4939-9173-0_1.

SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline.

Front Microbiol. 2019 Jan 24;9:3349. doi: 10.3389/fmicb.2018.03349. eCollection 2018.

CATH: expanding the horizons of structure-based functional annotations for genome sequences.

Nucleic Acids Res. 2019 Jan 8;47(D1):D280-D284. doi: 10.1093/nar/gky1097.

Species-level functional profiling of metagenomes and metatranscriptomes.

Nat Methods. 2018 Nov;15(11):962-968. doi: 10.1038/s41592-018-0176-y. Epub 2018 Oct 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

美国能源部联合基因组研究所宏基因组工作流程

DOE JGI Metagenome Workflow.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译