基于模拟下一代测序数据的宏基因组组装评估。

Assessment of metagenomic assembly using simulated next generation sequencing data.

机构信息

European Molecular Biology Laboratory, Heidelberg, Germany.

出版信息

PLoS One. 2012;7(2):e31386. doi: 10.1371/journal.pone.0031386. Epub 2012 Feb 23.

DOI:10.1371/journal.pone.0031386

PMID:22384016

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3285633/

Abstract

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.

摘要

由于协议的复杂性和对微生物群落性质的有限了解，模拟宏基因组序列在利用宏基因组数据测试现有工具和数据分析方法的性能方面起着重要作用。我们开发了具有特定平台（Sanger、焦磷酸测序、Illumina）碱基错误模型的宏基因组读取模拟器，并模拟了具有不同群落复杂性的宏基因组。我们首先评估了严格质量控制对 Illumina 数据的影响。虽然质量过滤去除了很大一部分数据，但它大大提高了结果组装的准确性和 contig 长度。然后，我们将经过质量修剪的 Illumina 组装与 Sanger 和焦磷酸测序的组装进行了比较。对于简单的群落（10 个基因组），所有测序技术都组装了相似数量的 contig，并准确地代表了预期的功能组成。对于更复杂的群落（100 个基因组），Illumina 产生了最好的组装，并且更准确地反映了预期的功能组成。对于最复杂的群落（400 个基因组），任何测序技术的 reads 几乎都没有组装。然而，由于读长较长，Sanger reads 仍然相当准确地代表了整体功能组成。我们进一步研究了使用 Illumina 配对末端 reads 进行 contig 支架的效果。它显著增加了简单群落的 contig 长度，并对更复杂的群落产生了较小的改进。尽管 contig 长度的增加伴随着嵌合体的增加，但它产生了更完整的基因，并更好地描述了功能库。本研究开发的宏基因组模拟器是免费提供的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/3285633/8e0c98dd9b0e/pone.0031386.g001.jpg

相似文献

Assessment of metagenomic assembly using simulated next generation sequencing data.

PLoS One. 2012;7(2):e31386. doi: 10.1371/journal.pone.0031386. Epub 2012 Feb 23.

Evaluation of short read metagenomic assembly.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs.

Sci Rep. 2014 Oct 1;4:6480. doi: 10.1038/srep06480.

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.

Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut.

BMC Genomics. 2014 Jan 18;15:37. doi: 10.1186/1471-2164-15-37.

A comprehensive investigation of metagenome assembly by linked-read sequencing.

Microbiome. 2020 Nov 11;8(1):156. doi: 10.1186/s40168-020-00929-3.

Metagenomic Assembly: Reconstructing Genomes from Metagenomes.

Methods Mol Biol. 2021;2242:139-152. doi: 10.1007/978-1-0716-1099-2_9.

Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.

Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025.

Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.

Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.

BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.

引用本文的文献

Decoding the SCFA-CpxAR-OMP axis as a dietary checkpoint against antimicrobial resistance transmission across gut-environment interfaces.

ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf156.

ganon2: up-to-date and scalable metagenomics analysis.

NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf094. doi: 10.1093/nargab/lqaf094. eCollection 2025 Sep.

Altered gut microbiota in erectile dysfunction patients: a pilot study.

Front Microbiol. 2025 Jun 5;16:1530014. doi: 10.3389/fmicb.2025.1530014. eCollection 2025.

Multi-Metagenome Analysis Unravels Community Collapse After Sampling and Hints the Cultivation Strategy of CPR Bacteria in Groundwater.

Microorganisms. 2025 Apr 24;13(5):972. doi: 10.3390/microorganisms13050972.

Differences in Microbial Community Structure Determine the Functional Specialization of Gut Segments of .

Microorganisms. 2025 Apr 2;13(4):808. doi: 10.3390/microorganisms13040808.

Bracken: estimating species abundance in metagenomics data.

PeerJ Comput Sci. 2017;3. doi: 10.7717/peerj-cs.104. Epub 2017 Jan 2.

Comprehensive analysis of orthologous genes reveals functional dynamics and energy metabolism in the rhizospheric microbiome of Moringa oleifera.

Funct Integr Genomics. 2025 Apr 7;25(1):82. doi: 10.1007/s10142-025-01580-7.

Metagenomic Characterization of the Soil Rhizosphere: Uncovering Microbial Networks for Nutrient Acquisition and Plant Resilience in Arid Ecosystems.

Genes (Basel). 2025 Feb 26;16(3):285. doi: 10.3390/genes16030285.

The beneficial effects of a probiotic mix on bone and lean mass are dependent on the diet in female mice.

Sci Rep. 2025 Feb 20;15(1):6182. doi: 10.1038/s41598-025-91056-2.

MBCN: A novel reference database for Effcient Metagenomic analysis of human gut microbiome.

Heliyon. 2024 Sep 6;10(18):e37422. doi: 10.1016/j.heliyon.2024.e37422. eCollection 2024 Sep 30.

本文引用的文献

Evaluating the fidelity of de novo short read metagenomic assembly using simulated data.

PLoS One. 2011;6(5):e19984. doi: 10.1371/journal.pone.0019984. Epub 2011 May 23.

Field guide to next-generation DNA sequencers.

Mol Ecol Resour. 2011 Sep;11(5):759-69. doi: 10.1111/j.1755-0998.2011.03024.x. Epub 2011 May 19.

Enterotypes of the human gut microbiome.

Nature. 2011 May 12;473(7346):174-80. doi: 10.1038/nature09944. Epub 2011 Apr 20.

Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.

PLoS One. 2011 Mar 18;6(3):e18011. doi: 10.1371/journal.pone.0018011.

Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data.

Mol Syst Biol. 2011 Mar 15;7:473. doi: 10.1038/msb.2011.6.

Quality control and preprocessing of metagenomic datasets.

Bioinformatics. 2011 Mar 15;27(6):863-4. doi: 10.1093/bioinformatics/btr026. Epub 2011 Jan 28.

SmashCommunity: a metagenomic annotation and analysis tool.

Bioinformatics. 2010 Dec 1;26(23):2977-8. doi: 10.1093/bioinformatics/btq536. Epub 2010 Oct 19.

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data.

BMC Bioinformatics. 2010 Sep 27;11:485. doi: 10.1186/1471-2105-11-485.

METAREP: JCVI metagenomics reports--an open source tool for high-performance comparative metagenomics.

Bioinformatics. 2010 Oct 15;26(20):2631-2. doi: 10.1093/bioinformatics/btq455. Epub 2010 Aug 26.

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. Epub 2010 Aug 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于模拟下一代测序数据的宏基因组组装评估。

Assessment of metagenomic assembly using simulated next generation sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献