Suppr超能文献

DeepMAsED:评估宏基因组组装质量。

DeepMAsED: evaluating the quality of metagenomic assemblies.

机构信息

Department of Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen 72076, Germany.

Department of Computer Science, ETH Zürich, Zürich 8092, Switzerland.

出版信息

Bioinformatics. 2020 May 1;36(10):3011-3017. doi: 10.1093/bioinformatics/btaa124.

Abstract

MOTIVATION

Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.

RESULTS

We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.

CONCLUSIONS

DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.

AVAILABILITY AND IMPLEMENTATION

DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组组装方法学的进步使得发表的宏基因组组装数量迅速增加。然而,由于缺乏可作为伪真实数据的密切相关的参考基因组,因此识别错误组装是具有挑战性的。现有的无参考方法不再被维护,可能会做出不适用与各种研究项目的强烈假设,并且尚未在大规模宏基因组组装上进行验证。

结果

我们提出了 DeepMAsED,这是一种无需参考基因组即可识别错误组装的深度学习方法。此外,我们提供了一个用于生成大规模、真实宏基因组组装的计算管道,以便进行全面的模型训练和测试。当应用于大型和复杂的宏基因组组装时,DeepMAsED 的准确性大大超过了现有技术。我们的模型估计在最近的两项大规模宏基因组组装出版物中,有 1%的基因组错误组装率。

结论

DeepMAsED 无需参考基因组或强烈的建模假设,即可准确识别来自广泛的细菌和古菌的宏基因组组装中存在的错误组装。运行 DeepMAsED 非常简单,并且可以使用我们的数据集生成管道来重新训练模型。因此,DeepMAsED 是一种灵活的错误组装分类器,可以应用于广泛的宏基因组组装项目。

可用性和实现

DeepMAsED 可在 GitHub 上获得,网址为 https://github.com/leylabmpi/DeepMAsED。

补充信息

补充数据可在 Bioinformatics 在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验