DeepMAsED：评估宏基因组组装质量。

DeepMAsED: evaluating the quality of metagenomic assemblies.

机构信息

Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen 72076, Germany.

Department of Computer Science, ETH Zürich, Zürich 8092, Switzerland.

出版信息

Bioinformatics. 2020 May 1;36(10):3011-3017. doi: 10.1093/bioinformatics/btaa124.

DOI:10.1093/bioinformatics/btaa124

PMID:32096824

Abstract

MOTIVATION

Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.

RESULTS

We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.

CONCLUSIONS

DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.

AVAILABILITY AND IMPLEMENTATION

DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组组装方法学的进步使得发表的宏基因组组装数量迅速增加。然而，由于缺乏可作为伪真实数据的密切相关的参考基因组，因此识别错误组装是具有挑战性的。现有的无参考方法不再被维护，可能会做出不适用与各种研究项目的强烈假设，并且尚未在大规模宏基因组组装上进行验证。

结果

我们提出了 DeepMAsED，这是一种无需参考基因组即可识别错误组装的深度学习方法。此外，我们提供了一个用于生成大规模、真实宏基因组组装的计算管道，以便进行全面的模型训练和测试。当应用于大型和复杂的宏基因组组装时，DeepMAsED 的准确性大大超过了现有技术。我们的模型估计在最近的两项大规模宏基因组组装出版物中，有 1%的基因组错误组装率。

结论

DeepMAsED 无需参考基因组或强烈的建模假设，即可准确识别来自广泛的细菌和古菌的宏基因组组装中存在的错误组装。运行 DeepMAsED 非常简单，并且可以使用我们的数据集生成管道来重新训练模型。因此，DeepMAsED 是一种灵活的错误组装分类器，可以应用于广泛的宏基因组组装项目。

可用性和实现

DeepMAsED 可在 GitHub 上获得，网址为 https://github.com/leylabmpi/DeepMAsED。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

DeepMAsED: evaluating the quality of metagenomic assemblies.DeepMAsED：评估宏基因组组装质量。

Bioinformatics. 2020 May 1;36(10):3011-3017. doi: 10.1093/bioinformatics/btaa124.

metaMIC: reference-free misassembly identification and correction of de novo metagenomic assemblies.metaMIC：从头宏基因组组装中无参考错误组装的识别和纠正。

Genome Biol. 2022 Nov 14;23(1):242. doi: 10.1186/s13059-022-02810-y.

ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning.ResMiCo：利用深度学习提高宏基因组组装基因组的质量。

PLoS Comput Biol. 2023 May 1;19(5):e1011001. doi: 10.1371/journal.pcbi.1011001. eCollection 2023 May.

GraphBin: refined binning of metagenomic contigs using assembly graphs.GraphBin：使用组装图对宏基因组序列进行精细化分箱。

Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.

MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs).MAGpy：用于宏基因组组装基因组（MAG）下游分析的可重复管道。

Bioinformatics. 2019 Jun 1;35(12):2150-2152. doi: 10.1093/bioinformatics/bty905.

3CAC: improving the classification of phages and plasmids in metagenomic assemblies using assembly graphs.3CAC：利用组装图提高宏基因组组装中噬菌体和质粒的分类。

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii56-ii61. doi: 10.1093/bioinformatics/btac468.

Metaviral SPAdes: assembly of viruses from metagenomic data.Metaviral SPAdes：从宏基因组数据中组装病毒。

Bioinformatics. 2020 Aug 15;36(14):4126-4129. doi: 10.1093/bioinformatics/btaa490.

MetaQUAST: evaluation of metagenome assemblies.MetaQUAST：评估宏基因组组装。

Bioinformatics. 2016 Apr 1;32(7):1088-90. doi: 10.1093/bioinformatics/btv697. Epub 2015 Nov 26.

Struo: a pipeline for building custom databases for common metagenome profilers.Struo：用于为常见宏基因组分析器构建自定义数据库的管道。

Bioinformatics. 2020 Apr 1;36(7):2314-2315. doi: 10.1093/bioinformatics/btz899.

CoCoNet: an efficient deep learning tool for viral metagenome binning.CoCoNet：一种用于病毒宏基因组分箱的高效深度学习工具。

Bioinformatics. 2021 Sep 29;37(18):2803-2810. doi: 10.1093/bioinformatics/btab213.

引用本文的文献

Genome-resolved metagenomics from short-read sequencing data in the era of artificial intelligence.人工智能时代基于短读长测序数据的基因组解析宏基因组学

Funct Integr Genomics. 2025 Jun 10;25(1):124. doi: 10.1007/s10142-025-01625-x.

Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers.利用精简的k-mer集从复杂宏基因组中高效地从头组装和恢复微生物基因组

Interdiscip Sci. 2025 Jun 2. doi: 10.1007/s12539-025-00722-6.

Deep learning in microbiome analysis: a comprehensive review of neural network models.微生物组分析中的深度学习：神经网络模型综述

Front Microbiol. 2025 Jan 22;15:1516667. doi: 10.3389/fmicb.2024.1516667. eCollection 2024.

MAGs-centric crack: how long will, spore-positive and most , microsymbionts remain recalcitrant to axenic growth?以宏基因组组装基因组为中心的难题：孢子阳性且大多数微共生体对无菌生长保持顽固抗性的情况会持续多久？

Front Microbiol. 2024 Jul 31;15:1367490. doi: 10.3389/fmicb.2024.1367490. eCollection 2024.

Long-read sequencing reveals extensive gut phageome structural variations driven by genetic exchange with bacterial hosts.长读测序揭示了广泛的肠道噬菌体组结构变异，这些变异是由与细菌宿主的基因交换驱动的。

Sci Adv. 2024 Aug 16;10(33):eadn3316. doi: 10.1126/sciadv.adn3316. Epub 2024 Aug 14.

Unveiling microbial diversity: harnessing long-read sequencing technology.揭示微生物多样性：利用长读长测序技术

Nat Methods. 2024 Jun;21(6):954-966. doi: 10.1038/s41592-024-02262-1. Epub 2024 Apr 30.

Integrating taxonomic signals from MAGs and contigs improves read annotation and taxonomic profiling of metagenomes.将宏基因组和 contigs 的分类学信号进行整合，可以提高宏基因组的读注释和分类学分析。

Nat Commun. 2024 Apr 20;15(1):3373. doi: 10.1038/s41467-024-47155-1.

Deep learning methods in metagenomics: a review.元基因组学中的深度学习方法：综述。

Microb Genom. 2024 Apr;10(4). doi: 10.1099/mgen.0.001231.

ContScout: sensitive detection and removal of contamination from annotated genomes.ContScout：注释基因组中污染的敏感检测和去除。

Nat Commun. 2024 Jan 31;15(1):936. doi: 10.1038/s41467-024-45024-5.

ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning.ResMiCo：利用深度学习提高宏基因组组装基因组的质量。

PLoS Comput Biol. 2023 May 1;19(5):e1011001. doi: 10.1371/journal.pcbi.1011001. eCollection 2023 May.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DeepMAsED：评估宏基因组组装质量。

DeepMAsED: evaluating the quality of metagenomic assemblies.

机构信息

出版信息

MOTIVATION

RESULTS

CONCLUSIONS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

结论

可用性和实现

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献