Bioinformatics and Medical Informatics, San Diego State University, San Diego, California, USA.
National Center for Genome Analysis Support, Indiana University, Bloomington, Indiana, USA.
BMC Genomics. 2017 Nov 28;18(1):915. doi: 10.1186/s12864-017-4294-1.
BACKGROUND: Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. METHODS: We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. RESULTS: We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. CONCLUSIONS: In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
背景:微生物组/宿主相互作用描述了影响宿主健康的特征。 shotgun 宏基因组学包括对微生物组的随机子集进行测序,以分析其分类和代谢潜力。从宏基因组中重建 DNA 片段成基因组(称为宏基因组组装基因组)将未知片段分配给分类/功能,并有助于发现新的生物体。基因组重建包括序列组装和将组装的序列分类到基因组特征的bins 中。然而,微生物群落组成,包括分类和系统发育多样性,可能会影响基因组重建。我们使用一组参数来选择最佳的组装和 binning 工具,针对具有不同测序平台(IonTorrent 和 Illumina)、多样性(高或低)和环境(珊瑚礁和海带林)的四个微生物组项目,确定了最佳的重建方法。
方法:我们使用来自 4 个项目的 105 个海洋宏基因组,测试了组装和 binning 过程对种群基因组重建的影响。使用 3 个组装器(IDBA、MetaVelvet 和 SPAdes)和 2 个 binning 工具(GroopM 和 MetaBat)从每个项目中获得重建的基因组。我们使用包括连续体和连续体嵌合体的统计数据评估组装器的效率,以及使用基因组完整性和分类鉴定评估 binning 工具的有效性。
结果:我们得出结论,SPAdes 组装了更多的 contigs(143718 ± 124 contigs),长度更长(N50 = 1632 ± 108 bp),并整合了更多的序列(sequences-assembled = 19.65%)。微生物丰富度和均匀度在整个组装过程中保持不变,表明嵌合体数量较少。与其他组装器相比,SPAdes 组装器对项目内的生物学和技术变化具有响应性。在 binning 工具方面,我们得出结论,MetaBat 生成的 bins 在 GC 含量(平均标准偏差:1.49)、物种丰富度(4.91 ± 0.66)和基因组完整性(40.92 ± 1.75)方面变化较小。MetaBat 从 4 个项目中提取了 115 个 bins,其中 66 个被确定为具有特定属序列的重建宏基因组组装基因组。我们鉴定了 13 个新的基因组,其中一些是 100%完整的,但与数据库中的基因组相似度较低。
结论:总之,我们提出了一组具有生物学意义的参数来评估和选择最佳的组装和 binning 工具。对于我们测试的工具,SPAdes 组装器和 MetaBat binning 工具为这四个项目重建了高质量的宏基因组组装基因组。我们还得出结论,来自具有高覆盖率的分类学上不同的和低分类多样性的微生物群落的宏基因组,产生了最高质量的宏基因组组装基因组。
BMC Genomics. 2017-11-28
BMC Bioinformatics. 2020-7-28
Front Bioeng Biotechnol. 2015-9-17
BMC Bioinformatics. 2017-5-2
Microbiol Spectr. 2021-12-22
BMC Genomics. 2014-11-18
Gigascience. 2025-1-6
Environ Microbiol. 2025-6
Brief Bioinform. 2024-7-25
BMC Genomics. 2024-1-12
Front Microbiol. 2023-3-2
Environ Microbiol Rep. 2017-6-27
Nucleic Acids Res. 2017-1-4
Front Microbiol. 2016-11-14
Front Microbiol. 2016-4-12
Proc Natl Acad Sci U S A. 2016-5-24
Microbiome. 2016-3-8