Howe Adina, Chain Patrick S G
GERMS Laboratory, Department of Agricultural and Biosystems Engineering, Iowa State University , Ames, IA, USA.
Bioinformatics and Analytics Team, Bioscience Division, Los Alamos National Laboratory , Los Alamos, NM, USA.
Front Microbiol. 2015 Jul 9;6:678. doi: 10.3389/fmicb.2015.00678. eCollection 2015.
Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.
宏基因组学研究在揭示环境微生物的遗传学、生理学和生态学方面具有巨大潜力。宏基因组分析目前面临的挑战与我们将测序读数、其来源群体及其编码功能联系起来的能力有关。基于组装的方法通过将重叠读数扩展为更大的连续序列(重叠群)来减小数据集大小,为不依赖现有参考的遗传序列提供上下文信息。然而,这些方法往往计算量很大,并且再次受到测序错误以及基因组重复的挑战。虽然已经基于这些方法概念开发了许多工具,但它们给宏基因组学研究人员带来了令人困惑的选择和培训要求。为了帮助更便捷地使用组装工具,本综述还包括一个IPython Notebook宏基因组组装教程。本教程有在使用亚马逊弹性云计算的任何操作系统上执行的说明,并指导用户完成下载、组装以及将读数映射到模拟微生物群落宏基因组的重叠群的过程。尽管存在挑战,但宏基因组分析已经揭示了对地球上许多环境的新见解。随着软件、培训和数据不断涌现,宏基因组数据的获取及其发现将会不断增加。