Suppr超能文献

机器学习与深度学习在宏基因组分类学和功能注释中的应用

Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation.

作者信息

Mathieu Alban, Leclercq Mickael, Sanabria Melissa, Perin Olivier, Droit Arnaud

机构信息

Computational Biology Laboratory, CHU de Québec - Université Laval Research Centre, Québec City, QC, Canada.

Université Côte d'Azur, CNRS, INRIA, I3S, Nice, France.

出版信息

Front Microbiol. 2022 Mar 14;13:811495. doi: 10.3389/fmicb.2022.811495. eCollection 2022.

Abstract

Shotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionized the field of environmental microbiology, allowing the characterization of all microorganisms in a sequencing experiment. To identify the microbes in terms of taxonomy and biological activity, the sequenced reads must necessarily be aligned on known microbial genomes/genes. However, current alignment methods are limited in terms of speed and can produce a significant number of false positives when detecting bacterial species or false negatives in specific cases (virus, plasmids, and gene detection). Moreover, recent advances in metagenomics have enabled the reconstruction of new genomes using binning strategies, but these genomes, not yet fully characterized, are not used in classic approaches, whereas machine and deep learning methods can use them as models. In this article, we attempted to review the different methods and their efficiency to improve the annotation of metagenomic sequences. Deep learning models have reached the performance of the widely used k-mer alignment-based tools, with better accuracy in certain cases; however, they still must demonstrate their robustness across the variety of environmental samples and across the rapid expansion of accessible genomes in databases.

摘要

对环境DNA进行鸟枪法测序(即宏基因组学)彻底改变了环境微生物学领域,使得在一次测序实验中就能对所有微生物进行表征。为了从分类学和生物活性方面鉴定微生物,测序读段必须与已知的微生物基因组/基因进行比对。然而,当前的比对方法在速度方面存在局限,并且在检测细菌物种时可能会产生大量假阳性结果,或者在特定情况下(病毒、质粒和基因检测)出现假阴性结果。此外,宏基因组学的最新进展使得利用分箱策略重建新基因组成为可能,但这些尚未完全表征的基因组并未用于传统方法,而机器学习和深度学习方法可以将它们用作模型。在本文中,我们试图综述不同的方法及其在改进宏基因组序列注释方面的效率。深度学习模型已经达到了广泛使用的基于k-mer比对工具的性能,在某些情况下具有更高的准确性;然而,它们仍必须在各种环境样本以及数据库中可获取基因组的快速扩充中证明其稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d28a/8964132/a3467fb11737/fmicb-13-811495-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验