机器学习与深度学习在宏基因组分类学和功能注释中的应用

Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation.

作者信息

Mathieu Alban, Leclercq Mickael, Sanabria Melissa, Perin Olivier, Droit Arnaud

机构信息

Computational Biology Laboratory, CHU de Québec - Université Laval Research Centre, Québec City, QC, Canada.

Université Côte d'Azur, CNRS, INRIA, I3S, Nice, France.

出版信息

Front Microbiol. 2022 Mar 14;13:811495. doi: 10.3389/fmicb.2022.811495. eCollection 2022.

DOI:10.3389/fmicb.2022.811495

PMID:35359727

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8964132/

Abstract

Shotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionized the field of environmental microbiology, allowing the characterization of all microorganisms in a sequencing experiment. To identify the microbes in terms of taxonomy and biological activity, the sequenced reads must necessarily be aligned on known microbial genomes/genes. However, current alignment methods are limited in terms of speed and can produce a significant number of false positives when detecting bacterial species or false negatives in specific cases (virus, plasmids, and gene detection). Moreover, recent advances in metagenomics have enabled the reconstruction of new genomes using binning strategies, but these genomes, not yet fully characterized, are not used in classic approaches, whereas machine and deep learning methods can use them as models. In this article, we attempted to review the different methods and their efficiency to improve the annotation of metagenomic sequences. Deep learning models have reached the performance of the widely used k-mer alignment-based tools, with better accuracy in certain cases; however, they still must demonstrate their robustness across the variety of environmental samples and across the rapid expansion of accessible genomes in databases.

摘要

对环境DNA进行鸟枪法测序（即宏基因组学）彻底改变了环境微生物学领域，使得在一次测序实验中就能对所有微生物进行表征。为了从分类学和生物活性方面鉴定微生物，测序读段必须与已知的微生物基因组/基因进行比对。然而，当前的比对方法在速度方面存在局限，并且在检测细菌物种时可能会产生大量假阳性结果，或者在特定情况下（病毒、质粒和基因检测）出现假阴性结果。此外，宏基因组学的最新进展使得利用分箱策略重建新基因组成为可能，但这些尚未完全表征的基因组并未用于传统方法，而机器学习和深度学习方法可以将它们用作模型。在本文中，我们试图综述不同的方法及其在改进宏基因组序列注释方面的效率。深度学习模型已经达到了广泛使用的基于k-mer比对工具的性能，在某些情况下具有更高的准确性；然而，它们仍必须在各种环境样本以及数据库中可获取基因组的快速扩充中证明其稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d28a/8964132/a3467fb11737/fmicb-13-811495-g001.jpg

相似文献

Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation.机器学习与深度学习在宏基因组分类学和功能注释中的应用

Front Microbiol. 2022 Mar 14;13:811495. doi: 10.3389/fmicb.2022.811495. eCollection 2022.

Deep learning models for bacteria taxonomic classification of metagenomic data.基于深度学习的宏基因组数据细菌分类学分类模型

BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. doi: 10.1186/s12859-018-2182-6.

Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.评估宏基因组工具在真实宏基因组数据集和 CAMI 数据集上的基因组 binning 效果。

BMC Bioinformatics. 2020 Jul 28;21(1):334. doi: 10.1186/s12859-020-03667-3.

MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.MetaVW：用于宏基因组学序列分类的大规模机器学习

Methods Mol Biol. 2018;1807:9-20. doi: 10.1007/978-1-4939-8561-6_2.

Large-scale machine learning for metagenomics sequence classification.用于宏基因组学序列分类的大规模机器学习

Bioinformatics. 2016 Apr 1;32(7):1023-32. doi: 10.1093/bioinformatics/btv683. Epub 2015 Nov 20.

METAnnotatorX2: a Comprehensive Tool for Deep and Shallow Metagenomic Data Set Analyses.METAnnotatorX2：用于深度和浅层宏基因组数据集分析的综合工具。

mSystems. 2021 Jun 29;6(3):e0058321. doi: 10.1128/mSystems.00583-21.

MSPminer: abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data.MSPminer：基于丰度的宏基因组数据中微生物泛基因组重建。

Bioinformatics. 2019 May 1;35(9):1544-1552. doi: 10.1093/bioinformatics/bty830.

Selection of marker genes for genetic barcoding of microorganisms and binning of metagenomic reads by Barcoder software tools.微生物遗传条形码标记基因的选择和 Barcoder 软件工具对宏基因组读段的分类。

BMC Bioinformatics. 2018 Aug 30;19(1):309. doi: 10.1186/s12859-018-2320-1.

MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.MBMC：一种用于对环境鸟枪法测序项目中的宏基因组读数进行分箱的有效马尔可夫链方法。

OMICS. 2016 Aug;20(8):470-9. doi: 10.1089/omi.2016.0081. Epub 2016 Jul 22.

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.元基因组分类器的综合基准测试和集成方法。

Genome Biol. 2017 Sep 21;18(1):182. doi: 10.1186/s13059-017-1299-7.

引用本文的文献

Hybrid Deep Learning Framework for High-Accuracy Classification of Morphologically Similar Puffball Species Using CNN and Transformer Architectures.使用卷积神经网络（CNN）和Transformer架构的混合深度学习框架，用于对形态相似的马勃菌物种进行高精度分类。

Biology (Basel). 2025 Jul 5;14(7):816. doi: 10.3390/biology14070816.

Genome-resolved metagenomics from short-read sequencing data in the era of artificial intelligence.人工智能时代基于短读长测序数据的基因组解析宏基因组学

Funct Integr Genomics. 2025 Jun 10;25(1):124. doi: 10.1007/s10142-025-01625-x.

Cutting-edge deep-learning based tools for metagenomic research.用于宏基因组学研究的前沿深度学习工具。

Natl Sci Rev. 2025 Feb 19;12(6):nwaf056. doi: 10.1093/nsr/nwaf056. eCollection 2025 Jun.

Reveal your microbes, and i'll reveal your origins: geographical traceability via Scomber colias intestinal tract metagenomics.揭示你的微生物，我就能揭示你的起源：通过鲭鲆肠道宏基因组学进行地理溯源。

Anim Microbiome. 2025 May 6;7(1):43. doi: 10.1186/s42523-025-00398-9.

FGeneBERT: function-driven pre-trained gene language model for metagenomics.FGeneBERT：用于宏基因组学的功能驱动型预训练基因语言模型

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf149.

A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.

Deep learning in microbiome analysis: a comprehensive review of neural network models.微生物组分析中的深度学习：神经网络模型综述

Front Microbiol. 2025 Jan 22;15:1516667. doi: 10.3389/fmicb.2024.1516667. eCollection 2024.

Impact of database choice and confidence score on the performance of taxonomic classification using Kraken2.数据库选择和置信度分数对使用Kraken2进行分类学分类性能的影响。

aBIOTECH. 2024 Jul 31;5(4):465-475. doi: 10.1007/s42994-024-00178-0. eCollection 2024 Dec.

Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches.人工智能与生物信息学：从传统技术到智能方法的历程。

Gastroenterol Hepatol Bed Bench. 2024;17(3):241-252. doi: 10.22037/ghfbb.v17i3.2977.

Human Gut Microbiota for Diagnosis and Treatment of Depression.人类肠道微生物群在抑郁症的诊断和治疗中的作用。

Int J Mol Sci. 2024 May 26;25(11):5782. doi: 10.3390/ijms25115782.

本文引用的文献

A guide to machine learning for biologists.生物学机器学习指南。

Nat Rev Mol Cell Biol. 2022 Jan;23(1):40-55. doi: 10.1038/s41580-021-00407-0. Epub 2021 Sep 13.

Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3.利用 bioBakery 3 整合具有分类学、功能和菌株水平特征的多样化微生物群落。

Elife. 2021 May 4;10:e65088. doi: 10.7554/eLife.65088.

DeepMicrobes: taxonomic classification for metagenomics with deep learning.深度微生物：用于宏基因组学的深度学习分类法

NAR Genom Bioinform. 2020 Feb 19;2(1):lqaa009. doi: 10.1093/nargab/lqaa009. eCollection 2020 Mar.

The Gene Ontology resource: enriching a GOld mine.基因本体论资源：丰富一个 GOld 矿。

Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.

UniProt: the universal protein knowledgebase in 2021.UniProt：2021 年的通用蛋白质知识库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

Metagenomic assessment of the global diversity and distribution of bacteria and fungi.宏基因组评估细菌和真菌的全球多样性和分布。

Environ Microbiol. 2021 Jan;23(1):316-326. doi: 10.1111/1462-2920.15314. Epub 2020 Dec 2.

Pfam: The protein families database in 2021.Pfam：2021 年的蛋白质家族数据库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

Resisting antimicrobial resistance.对抗抗菌素耐药性。

Nat Rev Microbiol. 2020 May;18(5):259-260. doi: 10.1038/s41579-020-0348-5.

Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes.人类粪便宏基因组功能注释的测序前和测序后建议。

BMC Bioinformatics. 2020 Feb 24;21(1):74. doi: 10.1186/s12859-020-3416-y.

Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。

Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习与深度学习在宏基因组分类学和功能注释中的应用

Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献