LMAS：通过定义的群落评估宏基因组短从头组装方法。

LMAS: evaluating metagenomic short de novo assembly methods through defined communities.

机构信息

Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal.

Faculty of Health Sciences, Ben-Gurion University of the Negev, 8410501 Beer-Sheva, Israel.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac122.

DOI:10.1093/gigascience/giac122

PMID:36576131

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9795473/

Abstract

BACKGROUND

The de novo assembly of raw sequence data is key in metagenomic analysis. It allows recovering draft genomes from a pool of mixed raw reads, yielding longer sequences that offer contextual information and provide a more complete picture of the microbial community.

FINDINGS

To better compare de novo assemblers for metagenomic analysis, LMAS (Last Metagenomic Assembler Standing) was developed as a flexible platform allowing users to evaluate assembler performance given known standard communities. Overall, in our test datasets, k-mer De Bruijn graph assemblers outperformed the alternative approaches but came with a greater computational cost. Furthermore, assemblers branded as metagenomic specific did not consistently outperform other genomic assemblers in metagenomic samples. Some assemblers still in use, such as ABySS, MetaHipmer2, minia, and VelvetOptimiser, perform relatively poorly and should be used with caution when assembling complex samples. Meaningful strain resolution at the single-nucleotide polymorphism level was not achieved, even by the best assemblers tested.

CONCLUSIONS

The choice of a de novo assembler depends on the computational resources available, the replicon of interest, and the major goals of the analysis. No single assembler appeared an ideal choice for short-read metagenomic prokaryote replicon assembly, each showing specific strengths. The choice of metagenomic assembler should be guided by user requirements and characteristics of the sample of interest, and LMAS provides an interactive evaluation platform for this purpose. LMAS is open source, and the workflow and its documentation are available at https://github.com/B-UMMI/LMAS and https://lmas.readthedocs.io/, respectively.

摘要

背景

从头测序数据组装是宏基因组分析的关键。它允许从混合原始读取池中恢复草案基因组，生成提供上下文信息并更全面地描绘微生物群落的更长序列。

发现

为了更好地比较宏基因组分析的从头组装程序，开发了 LMAS（最后一个宏基因组组装程序），它是一个灵活的平台，允许用户在给定已知标准群落的情况下评估组装程序的性能。总体而言，在我们的测试数据集，k-mer De Bruijn 图形组装程序优于替代方法，但计算成本更高。此外，被标记为专门用于宏基因组的组装程序并不总是在宏基因组样本中优于其他基因组组装程序。一些仍在使用的组装程序，如 ABySS、MetaHipmer2、minia 和 VelvetOptimiser，性能相对较差，在组装复杂样本时应谨慎使用。即使是经过测试的最佳组装程序，也无法在单核苷酸多态性水平上实现有意义的菌株分辨率。

结论

从头组装程序的选择取决于可用的计算资源、感兴趣的复制子以及分析的主要目标。没有一个组装程序似乎是短读宏基因组原核生物复制子组装的理想选择，每个程序都有特定的优势。宏基因组组装程序的选择应根据用户需求和感兴趣样本的特点来指导，LMAS 为此提供了一个交互式评估平台。LMAS 是开源的，工作流程及其文档分别可在 https://github.com/B-UMMI/LMAS 和 https://lmas.readthedocs.io/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fdf/9795473/3f716aa6ddd2/giac122fig1.jpg

相似文献

LMAS: evaluating metagenomic short de novo assembly methods through defined communities.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac122.

Clover: a clustering-oriented de novo assembler for Illumina sequences.

BMC Bioinformatics. 2020 Nov 17;21(1):528. doi: 10.1186/s12859-020-03788-9.

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning.

DNA Res. 2015 Feb;22(1):69-77. doi: 10.1093/dnares/dsu041. Epub 2014 Nov 27.

InteMAP: Integrated metagenomic assembly pipeline for NGS short reads.

BMC Bioinformatics. 2015 Aug 7;16:244. doi: 10.1186/s12859-015-0686-x.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

Evaluation of short read metagenomic assembly.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

Practical evaluation of 11 de novo assemblers in metagenome assembly.

J Microbiol Methods. 2018 Aug;151:99-105. doi: 10.1016/j.mimet.2018.06.007. Epub 2018 Jun 25.

Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.

Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025.

Meta-IDBA: a de Novo assembler for metagenomic data.

Bioinformatics. 2011 Jul 1;27(13):i94-101. doi: 10.1093/bioinformatics/btr216.

RResolver: efficient short-read repeat resolution within ABySS.

BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.

引用本文的文献

Landscape of the metaplasmidome of deep-sea hydrothermal vents located at Arctic Mid-Ocean Ridges in the Norwegian-Greenland Sea: ecological insights from comparative analysis of plasmid identification tools.

FEMS Microbiol Ecol. 2024 Sep 14;100(10). doi: 10.1093/femsec/fiae124.

Enhancing Clinical Utility: Utilization of International Standards and Guidelines for Metagenomic Sequencing in Infectious Disease Diagnosis.

Int J Mol Sci. 2024 Mar 15;25(6):3333. doi: 10.3390/ijms25063333.

Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements.

PeerJ. 2024 Jan 4;12:e16695. doi: 10.7717/peerj.16695. eCollection 2024.

本文引用的文献

Critical Assessment of Metagenome Interpretation: the second round of challenges.

Nat Methods. 2022 Apr;19(4):429-440. doi: 10.1038/s41592-022-01431-4. Epub 2022 Apr 8.

Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit.

Nat Protoc. 2021 Apr;16(4):1785-1801. doi: 10.1038/s41596-020-00480-3. Epub 2021 Mar 1.

Developing standards for the microbiome field.

Microbiome. 2020 Jun 26;8(1):98. doi: 10.1186/s40168-020-00856-3.

GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations.

BMC Genomics. 2020 Mar 2;21(1):193. doi: 10.1186/s12864-020-6568-2.

BlobToolKit - Interactive Quality Assessment of Genome Assemblies.

G3 (Bethesda). 2020 Apr 9;10(4):1361-1374. doi: 10.1534/g3.119.400908.

Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies.

Sci Data. 2019 Nov 26;6(1):285. doi: 10.1038/s41597-019-0287-z.

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.

Sci Rep. 2019 Oct 16;9(1):14882. doi: 10.1038/s41598-019-51284-9.

Recommendations for the packaging and containerizing of bioinformatics software.

F1000Res. 2018 Jun 14;7. doi: 10.12688/f1000research.15140.2. eCollection 2018.

Complete Genome Sequence of a Pseudomonas aeruginosa Isolate from a Kidney Stone.

Microbiol Resour Announc. 2019 Sep 19;8(38):e01073-19. doi: 10.1128/MRA.01073-19.

Ultra-deep, long-read nanopore sequencing of mock microbial community standards.

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz043.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

LMAS：通过定义的群落评估宏基因组短从头组装方法。

LMAS: evaluating metagenomic short de novo assembly methods through defined communities.

机构信息

Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal.

Faculty of Health Sciences, Ben-Gurion University of the Negev, 8410501 Beer-Sheva, Israel.