• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类微生物组研究中机器学习应用的数据预处理概述。

Overview of data preprocessing for machine learning applications in human microbiome research.

作者信息

Ibrahimi Eliana, Lopes Marta B, Dhamo Xhilda, Simeon Andrea, Shigdel Rajesh, Hron Karel, Stres Blaž, D'Elia Domenica, Berland Magali, Marcos-Zambrano Laura Judith

机构信息

Department of Biology, Faculty of Natural Sciences, University of Tirana, Tirana, Albania.

Department of Mathematics, Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal.

出版信息

Front Microbiol. 2023 Oct 5;14:1250909. doi: 10.3389/fmicb.2023.1250909. eCollection 2023.

DOI:10.3389/fmicb.2023.1250909
PMID:37869650
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10588656/
Abstract

Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.

摘要

尽管宏基因组测序现在是研究微生物组与宿主相互作用的首选技术,但分析和解释微生物组测序数据存在挑战,这些挑战主要归因于数据的统计学特性(例如,稀疏、过度分散、组成性、变量间依赖性)。本综述探讨了近期人类微生物组研究中应用的预处理和转换方法,以应对微生物组数据分析挑战。我们的结果表明,针对微生物组测序数据统计特征的转换方法应用有限。相反,普遍使用的是基于相对和归一化的转换,这些转换并未特别考虑微生物组数据的特定属性。许多出版物中关于分析前应用于数据的预处理和转换的信息不完整或缺失,导致了可重复性问题、可比性问题以及结果存疑。我们希望本综述能为人类微生物组研究领域的研究人员和新手提供各种数据转换工具的最新参考点,并帮助他们根据研究问题、目标和数据特征选择最合适的转换方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2746/10588656/19c379675dea/fmicb-14-1250909-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2746/10588656/19c379675dea/fmicb-14-1250909-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2746/10588656/19c379675dea/fmicb-14-1250909-g001.jpg

相似文献

1
Overview of data preprocessing for machine learning applications in human microbiome research.人类微生物组研究中机器学习应用的数据预处理概述。
Front Microbiol. 2023 Oct 5;14:1250909. doi: 10.3389/fmicb.2023.1250909. eCollection 2023.
2
Data pre-processing for analyzing microbiome data - A mini review.用于分析微生物组数据的数据预处理——一篇综述短文
Comput Struct Biotechnol J. 2023 Oct 4;21:4804-4815. doi: 10.1016/j.csbj.2023.10.001. eCollection 2023.
3
Proportion-based normalizations outperform compositional data transformations in machine learning applications.基于比例的归一化在机器学习应用中优于成分数据变换。
Microbiome. 2024 Mar 5;12(1):45. doi: 10.1186/s40168-023-01747-z.
4
Compositional data analysis of the microbiome: fundamentals, tools, and challenges.微生物组的成分数据分析:基础、工具与挑战
Ann Epidemiol. 2016 May;26(5):330-5. doi: 10.1016/j.annepidem.2016.03.002. Epub 2016 Mar 31.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Pretreating and normalizing metabolomics data for statistical analysis.预处理和标准化代谢组学数据以进行统计分析。
Genes Dis. 2023 Jul 7;11(3):100979. doi: 10.1016/j.gendis.2023.04.018. eCollection 2024 May.
7
Data Analysis Strategies for Microbiome Studies in Human Populations-a Systematic Review of Current Practice.人类群体微生物组研究的数据分析策略——当前实践的系统评价
mSystems. 2021 Feb 23;6(1):e01154-20. doi: 10.1128/mSystems.01154-20.
8
metaSPARSim: a 16S rRNA gene sequencing count data simulator.metaSPARSim:一种 16S rRNA 基因测序计数数据模拟器。
BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):416. doi: 10.1186/s12859-019-2882-6.
9
Machine learning approaches in microbiome research: challenges and best practices.微生物组研究中的机器学习方法:挑战与最佳实践
Front Microbiol. 2023 Sep 22;14:1261889. doi: 10.3389/fmicb.2023.1261889. eCollection 2023.
10
Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions.人类微生物组研究中的统计和机器学习技术:当代挑战与解决方案
Front Microbiol. 2021 Feb 22;12:635781. doi: 10.3389/fmicb.2021.635781. eCollection 2021.

引用本文的文献

1
Prediction of QTc Prolongation in Acute Poisoning with Atypical Antipsychotics Using Machine Learning Techniques: A Study from Poison Control Center.使用机器学习技术预测非典型抗精神病药物急性中毒时的QTc间期延长:来自中毒控制中心的一项研究
Cardiovasc Toxicol. 2025 Aug 30. doi: 10.1007/s12012-025-10055-x.
2
Personalized colorectal cancer risk assessment through explainable AI and Gut microbiome profiling.通过可解释的人工智能和肠道微生物群分析进行个性化结直肠癌风险评估。
Gut Microbes. 2025 Dec;17(1):2543124. doi: 10.1080/19490976.2025.2543124. Epub 2025 Aug 4.
3
Deep learning in microbiome analysis: a comprehensive review of neural network models.

本文引用的文献

1
Machine learning and deep learning applications in microbiome research.机器学习与深度学习在微生物组研究中的应用。
ISME Commun. 2022 Oct 6;2(1):98. doi: 10.1038/s43705-022-00182-9.
2
Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action.利用机器学习推进微生物组研究:ML4Microbiome COST行动的关键发现
Front Microbiol. 2023 Sep 25;14:1257002. doi: 10.3389/fmicb.2023.1257002. eCollection 2023.
3
Machine learning approaches in microbiome research: challenges and best practices.微生物组研究中的机器学习方法:挑战与最佳实践
微生物组分析中的深度学习:神经网络模型综述
Front Microbiol. 2025 Jan 22;15:1516667. doi: 10.3389/fmicb.2024.1516667. eCollection 2024.
4
Effects of data transformation and model selection on feature importance in microbiome classification data.数据转换和模型选择对微生物组分类数据中特征重要性的影响。
Microbiome. 2025 Jan 4;13(1):2. doi: 10.1186/s40168-024-01996-6.
5
Domain adaptation in small-scale and heterogeneous biological datasets.小规模和异构生物数据集中的域适应
Sci Adv. 2024 Dec 20;10(51):eadp6040. doi: 10.1126/sciadv.adp6040.
6
MetaBakery: a Singularity implementation of bioBakery tools as a skeleton application for efficient HPC deconvolution of microbiome metagenomic sequencing data to machine learning ready information.MetaBakery:一种生物烘焙工具的奇点实现,作为一个框架应用程序,用于将微生物组宏基因组测序数据高效地进行高性能计算解卷积,转化为适用于机器学习的信息。
Front Microbiol. 2024 Jul 30;15:1426465. doi: 10.3389/fmicb.2024.1426465. eCollection 2024.
7
Explainable artificial intelligence and microbiome data for food geographical origin: the Mozzarella di Bufala Campana PDO Case of Study.用于食品地理来源的可解释人工智能和微生物组数据:水牛乳清干酪受保护地理标志案例研究
Front Microbiol. 2024 Jun 3;15:1393243. doi: 10.3389/fmicb.2024.1393243. eCollection 2024.
8
Deep learning in bioinformatics.生物信息学中的深度学习。
Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.
9
Explainable artificial intelligence for microbiome data analysis in colorectal cancer biomarker identification.用于结直肠癌生物标志物识别的微生物组数据分析的可解释人工智能
Front Microbiol. 2024 Feb 15;15:1348974. doi: 10.3389/fmicb.2024.1348974. eCollection 2024.
Front Microbiol. 2023 Sep 22;14:1261889. doi: 10.3389/fmicb.2023.1261889. eCollection 2023.
4
Leakage and the reproducibility crisis in machine-learning-based science.基于机器学习的科学中的漏洞与可重复性危机。
Patterns (N Y). 2023 Aug 4;4(9):100804. doi: 10.1016/j.patter.2023.100804. eCollection 2023 Sep 8.
5
Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.利用 MetaPhlAn 4 对未鉴定物种进行宏基因组分类分析的扩展和改进。
Nat Biotechnol. 2023 Nov;41(11):1633-1644. doi: 10.1038/s41587-023-01688-w. Epub 2023 Feb 23.
6
A Statistical Perspective on the Challenges in Molecular Microbial Biology.分子微生物生物学挑战的统计学视角
J Agric Biol Environ Stat. 2021 Jun;26(2):131-160. doi: 10.1007/s13253-021-00447-1. Epub 2021 Mar 24.
7
Recent progress in analyzing the spatial structure of the human microbiome: distinguishing biogeography and architecture in the oral and gut communities.人类微生物组空间结构分析的最新进展:区分口腔和肠道群落中的生物地理学与结构
Curr Opin Endocr Metab Res. 2021 Jun;18:275-283. doi: 10.1016/j.coemr.2021.04.005. Epub 2021 Apr 26.
8
Early prediction of incident liver disease using conventional risk factors and gut-microbiome-augmented gradient boosting.利用常规风险因素和肠道微生物组增强梯度提升进行肝病的早期预测。
Cell Metab. 2022 May 3;34(5):719-730.e4. doi: 10.1016/j.cmet.2022.03.002. Epub 2022 Mar 29.
9
Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease.基于肠道微生物群的炎症性肠病诊断的数据处理方法和机器学习模型基准
Front Genet. 2022 Feb 14;13:784397. doi: 10.3389/fgene.2022.784397. eCollection 2022.
10
Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning.通过转录组和微生物组数据与有监督机器学习相结合,实现特应性皮炎的准确诊断。
Sci Rep. 2022 Jan 7;12(1):290. doi: 10.1038/s41598-021-04373-7.