Department of Biology, Boston University, Boston, MA, 02215, USA.
Department of Mathematics, University of Colorado, Boulder, Boulder, CO, 80309, USA.
F1000Res. 2021 Apr 19;10:299. doi: 10.12688/f1000research.51494.2. eCollection 2021.
The largest dataset of soil metagenomes has recently been released by the National Ecological Observatory Network (NEON), which performs annual shotgun sequencing of soils at 47 sites across the United States. NEON serves as a valuable educational resource, thanks to its open data and programming tutorials, but there is currently no introductory tutorial for accessing and analyzing the soil shotgun metagenomic dataset. Here, we describe methods for processing raw soil metagenome sequencing reads using a bioinformatics pipeline tailored to the high complexity and diversity of the soil microbiome. We describe the rationale, necessary resources, and implementation of steps such as cleaning raw reads, taxonomic classification, assembly into contigs or genomes, annotation of predicted genes using custom protein databases, and exporting data for downstream analysis. The workflow presented here aims to increase the accessibility of NEON's shotgun metagenome data, which can provide important clues about soil microbial communities and their ecological roles.
最近,国家生态观测网络(NEON)发布了最大的土壤宏基因组数据集,该数据集每年对美国 47 个地点的土壤进行鸟枪法测序。由于其开放数据和编程教程,NEON 是一个有价值的教育资源,但目前还没有关于访问和分析土壤鸟枪法宏基因组数据集的入门教程。在这里,我们描述了使用针对土壤微生物组的高复杂性和多样性量身定制的生物信息学管道来处理原始土壤宏基因组测序reads 的方法。我们描述了清理原始reads、分类学分类、组装成contigs 或基因组、使用自定义蛋白质数据库注释预测基因以及为下游分析导出数据等步骤的基本原理、必要资源和实施情况。这里提出的工作流程旨在提高 NEON 的鸟枪法宏基因组数据的可访问性,这些数据可以提供有关土壤微生物群落及其生态作用的重要线索。