Brief Bioinform. 2019 Jul 19;20(4):1140-1150. doi: 10.1093/bib/bbx098.
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
宏基因组样本是处于工作状态的复杂生态系统的快照。它们包含数百种已知和未知的物种,包含多个菌株变体,并且在内部和跨环境之间差异很大。在微生物群落中发现的许多微生物在培养中不易生长,因此它们的 DNA 序列是我们了解其进化历史和生物学功能的唯一线索。宏基因组组装是一种旨在从宏基因组混合物中重建基因和基因组的计算过程。目前的方法在重建包含操纵子、串联基因阵列和同线性块的 DNA 片段方面取得了重大进展。较短、高通量的测序技术已成为该领域的事实上的标准。测序仪现在能够在短短几天内生成数十亿个短读长。近年来,出现了多种宏基因组组装策略、管道和组装程序。由于宏基因组组装的固有复杂性,无论使用哪种组装算法和测序方法,宏基因组组装都包含错误。组装验证工具的最新发展在改进宏基因组组装程序方面发挥了关键作用。在这里,我们调查了宏基因组组装领域的最新进展,概述了基因组和宏基因组组装验证的关键方法,并通过使用组装验证策略展示了可以从组装中得出的见解。我们还讨论了长读技术在宏基因组学中的应用潜力。最后,我们讨论了宏基因组组装和验证领域的未来挑战和机遇。
Microbiol Spectr. 2021-12-22
Brief Bioinform. 2020-5-21
Brief Bioinform. 2019-7-19
Methods Mol Biol. 2021
Microbiome. 2020-11-11
Brief Bioinform. 2021-9-2
BMC Bioinformatics. 2016-10-28
J Microbiol Methods. 2018-8
Nat Rev Methods Primers. 2025
Curr Microbiol. 2025-4-22
Microorganisms. 2024-12-2
Nat Biotechnol. 2018-1-29
Genome Res. 2017-5
Nature. 2017-2-8