基于机器学习应用的微生物组数据综合概述：分类、可及性及未来方向。

A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions.

作者信息

Kumar Bablu, Lorusso Erika, Fosso Bruno, Pesole Graziano

机构信息

Università degli Studi di Milano, Milan, Italy.

Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy.

出版信息

Front Microbiol. 2024 Feb 13;15:1343572. doi: 10.3389/fmicb.2024.1343572. eCollection 2024.

DOI:10.3389/fmicb.2024.1343572

PMID:38419630

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10900530/

Abstract

Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.

摘要

宏基因组学、代谢组学和宏蛋白质组学通过提供对微生物群落组成和功能潜力的非培养依赖性见解，显著推进了我们对微生物群落的认识。然而，该领域的一个关键挑战是缺乏与原始数据相关的标准和全面的元数据，这阻碍了进行强大的数据分层和考虑混杂因素的能力。在这篇全面的综述中，我们将公开可用的微生物组数据分为五种类型：鸟枪法测序、扩增子测序、宏转录组学、代谢组学和宏蛋白质组学数据。我们探讨了元数据对于数据重用的重要性，并解决了收集标准化元数据方面的挑战。我们还评估了现有收集宏基因组数据的公共存储库在元数据收集方面的局限性。这篇综述强调了元数据在解释和比较数据集方面的重要作用，并强调需要标准化的元数据协议来充分利用宏基因组数据的潜力。此外，我们探索了机器学习（ML）在元数据检索中的未来应用方向，为更深入了解微生物群落及其生态作用提供了有前景的途径。利用这些工具将增强我们对不同生态系统中微生物功能能力和生态动态的见解。最后，我们强调了元数据在ML模型开发中的关键作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ff4/10900530/0400e56cd713/fmicb-15-1343572-g0001.jpg

相似文献

A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions.基于机器学习应用的微生物组数据综合概述：分类、可及性及未来方向。

Front Microbiol. 2024 Feb 13;15:1343572. doi: 10.3389/fmicb.2024.1343572. eCollection 2024.

SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata.SKIOME 项目：一个经过策展的皮肤微生物组数据集集合，其中包含丰富的与研究相关的元数据。

Database (Oxford). 2022 May 16;2022. doi: 10.1093/database/baac033.

Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.大型宏基因组数据集的机器学习荟萃分析：工具与生物学见解

PLoS Comput Biol. 2016 Jul 11;12(7):e1004977. doi: 10.1371/journal.pcbi.1004977. eCollection 2016 Jul.

Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir.利用 AncientMetagenomeDir 对已发表的古代宏基因组样本进行社区策划和标准化元数据处理。

Sci Data. 2021 Jan 26;8(1):31. doi: 10.1038/s41597-021-00816-y.

Functional dynamics of bacterial species in the mouse gut microbiome revealed by metagenomic and metatranscriptomic analyses.基于宏基因组和宏转录组分析揭示的小鼠肠道微生物组中细菌物种的功能动态。

PLoS One. 2020 Jan 24;15(1):e0227886. doi: 10.1371/journal.pone.0227886. eCollection 2020.

HumanMetagenomeDB: a public repository of curated and standardized metadata for human metagenomes.人源宏基因组数据库：一个经过精心整理和标准化的人源宏基因组元数据公共存储库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D743-D750. doi: 10.1093/nar/gkaa1031.

Practical considerations for sampling and data analysis in contemporary metagenomics-based environmental studies.当代基于宏基因组学的环境研究中采样与数据分析的实际考量

J Microbiol Methods. 2018 Nov;154:14-18. doi: 10.1016/j.mimet.2018.09.020. Epub 2018 Oct 1.

MISIP: a data standard for the reuse and reproducibility of any stable isotope probing-derived nucleic acid sequence and experiment.MISIP：任何稳定同位素探测衍生的核酸序列和实验的可重用性和可重复性的数据标准。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae071.

Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data.使用公开的 NEON 数据对全基因组鸟枪法和 16S 扩增子宏基因组测序进行微生物解析。

PLoS One. 2020 Feb 13;15(2):e0228899. doi: 10.1371/journal.pone.0228899. eCollection 2020.

The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes.动物相关宏基因组数据库揭示了对家畜和发达国家的偏向性以及动物相关微生物群落功能潜力研究中的盲点。

Anim Microbiome. 2023 Oct 5;5(1):48. doi: 10.1186/s42523-023-00267-3.

引用本文的文献

Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection.通过稳定特征选择确定微生物组-代谢组学整合的最佳机器学习方法。

bioRxiv. 2025 Jun 30:2025.06.21.660858. doi: 10.1101/2025.06.21.660858.

The gut-immune axis in primary immune thrombocytopenia (ITP): a paradigm shifts in treatment approaches.原发性免疫性血小板减少症（ITP）中的肠道-免疫轴：治疗方法的范式转变

Front Immunol. 2025 Jun 12;16:1595977. doi: 10.3389/fimmu.2025.1595977. eCollection 2025.

Do we need a standardized 16S rRNA gene amplicon sequencing analysis protocol for poultry microbiota research?家禽微生物群研究是否需要标准化的16S rRNA基因扩增子测序分析方案？

Poult Sci. 2025 Jul;104(7):105242. doi: 10.1016/j.psj.2025.105242. Epub 2025 May 1.

Advanced computational tools, artificial intelligence and machine-learning approaches in gut microbiota and biomarker identification.用于肠道微生物群和生物标志物识别的先进计算工具、人工智能和机器学习方法。

Front Med Technol. 2025 Apr 15;6:1434799. doi: 10.3389/fmedt.2024.1434799. eCollection 2024.

A cost and community perspective on the barriers to microbiome data reuse.从成本和社区角度看微生物组数据再利用的障碍。

Front Bioinform. 2025 Apr 9;5:1585717. doi: 10.3389/fbinf.2025.1585717. eCollection 2025.

Engineering Useful Microbial Species for Pharmaceutical Applications.工程改造用于制药应用的有用微生物物种。

Microorganisms. 2025 Mar 5;13(3):599. doi: 10.3390/microorganisms13030599.

The Role of Gut Microbiota Dysbiosis in Erectile Dysfunction: From Pathophysiology to Treatment Strategies.肠道微生物群失调在勃起功能障碍中的作用：从病理生理学到治疗策略

Microorganisms. 2025 Jan 23;13(2):250. doi: 10.3390/microorganisms13020250.

Microbiome Integrity Enhances the Efficacy and Safety of Anticancer Drug.微生物群完整性增强抗癌药物的疗效和安全性。

Biomedicines. 2025 Feb 10;13(2):422. doi: 10.3390/biomedicines13020422.

Is Short-Read 16S rRNA Sequencing of Oral Microbiome Sampling a Suitable Diagnostic Tool for Head and Neck Cancer?口腔微生物群样本的短读长16S rRNA测序是头颈癌的合适诊断工具吗？

Pathogens. 2024 Sep 24;13(10):826. doi: 10.3390/pathogens13100826.

A Proteogenomic Approach to Unveiling the Complex Biology of the Microbiome.一种揭示微生物组复杂生物学的蛋白质基因组学方法。

Int J Mol Sci. 2024 Sep 28;25(19):10467. doi: 10.3390/ijms251910467.

本文引用的文献

Methodology for biomarker discovery with reproducibility in microbiome data using machine learning.使用机器学习在微生物组数据中具有可重复性的生物标志物发现方法。

BMC Bioinformatics. 2024 Jan 15;25(1):26. doi: 10.1186/s12859-024-05639-3.

ML interpretability: Simple isn't easy.机器学习可解释性：简单并不容易。

Stud Hist Philos Sci. 2024 Feb;103:159-167. doi: 10.1016/j.shpsa.2023.12.007. Epub 2024 Jan 3.

Nine (not so simple) steps: a practical guide to using machine learning in microbial ecology.九步（并非那么简单）：微生物生态学中使用机器学习的实用指南。

mBio. 2024 Feb 14;15(2):e0205023. doi: 10.1128/mbio.02050-23. Epub 2023 Dec 21.

MetaboLights: open data repository for metabolomics.MetaboLights：代谢组学开放数据知识库。

Nucleic Acids Res. 2024 Jan 5;52(D1):D640-D646. doi: 10.1093/nar/gkad1045.

The European Nucleotide Archive in 2023.2023 年的欧洲核苷酸档案库。

Nucleic Acids Res. 2024 Jan 5;52(D1):D92-D97. doi: 10.1093/nar/gkad1067.

Machine learning and deep learning applications in microbiome research.机器学习与深度学习在微生物组研究中的应用。

ISME Commun. 2022 Oct 6;2(1):98. doi: 10.1038/s43705-022-00182-9.

Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease.基于机器学习的特征选择搜索稳定的微生物生物标志物：在炎症性肠病中的应用。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad083. Epub 2023 Oct 26.

Challenges and opportunities in sharing microbiome data and analyses.分享微生物组数据和分析的挑战与机遇。

Nat Microbiol. 2023 Nov;8(11):1960-1970. doi: 10.1038/s41564-023-01484-x. Epub 2023 Oct 2.

Lifestyle patterns influence the composition of the gut microbiome in a healthy Chinese population.生活方式影响健康中国人肠道微生物组的组成。

Sci Rep. 2023 Sep 2;13(1):14425. doi: 10.1038/s41598-023-41532-4.

Amplicon-Based Microbiome Profiling: From Second- to Third-Generation Sequencing for Higher Taxonomic Resolution.基于扩增子的微生物组分析：从第二代测序到第三代测序以提高分类分辨率。

Genes (Basel). 2023 Jul 31;14(8):1567. doi: 10.3390/genes14081567.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于机器学习应用的微生物组数据综合概述：分类、可及性及未来方向。

A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献