Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America.
PLoS Comput Biol. 2010 Feb 26;6(2):e1000667. doi: 10.1371/journal.pcbi.1000667.
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.
宏基因组学是一门能够对未培养微生物进行基因组研究的学科。更快、更廉价的测序技术和直接从其栖息地对未培养微生物进行采样测序的能力正在扩展并改变我们对微生物世界的看法。从数百万条新的基因组序列中提取有意义的信息,这对生物信息学家来说是一个严峻的挑战。在培养的微生物中,基因组数据来自于单个克隆,使得序列组装和注释具有可操作性。在宏基因组学中,数据来自于异质的微生物群落,有时包含超过 10000 个物种,序列数据存在噪声和不完整。从采样、组装、基因预测到功能预测,生物信息学在解释大量、嘈杂且常常不完整的序列数据方面面临新的需求。尽管宏基因组学相对较新,但在过去几年中,基于宏基因组的研究中应用了大量的计算方法。因此,本文的范围不包括对其进行详尽的综述。相反,我们在这里提供了一个简洁而全面的介绍,阐述了宏基因组学所提出的当前计算要求,并回顾了最近的进展。我们还指出是否有软件实现了这里提出的任何方法,并简要回顾了其效用。然而,如果本文的读者能够利用本期刊提供的评论部分,并讲述他们自己的经验,这将是非常有用的。最后,本文的最后一节提供了一些有代表性的研究,说明了使用宏基因组学进行的最近科学发现的不同方面。
PLoS Comput Biol. 2010-2-26
Genes (Basel). 2019-3-14
J Microbiol Methods. 2018-12
Protein Cell. 2018-4-25
Dongwuxue Yanjiu. 2012-12
Methods Mol Biol. 2022
Biochim Biophys Acta. 2011-10
ISME Commun. 2025-7-24
Front Med (Lausanne). 2025-3-19
Biology (Basel). 2025-2-27
Front Microbiol. 2025-1-8
FEMS Microbiol Ecol. 2025-1-7
Front Cell Infect Microbiol. 2024-11-15
Front Cell Infect Microbiol. 2024
BMC Bioinformatics. 2009-10-28
Nat Methods. 2009-11
Nat Methods. 2009-11
Nature. 2009-9-10
Bioinformatics. 2009-8-20
PLoS Comput Biol. 2009-6
Bioinformatics. 2009-6-10
J Bioinform Comput Biol. 2009-6