• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

微生物组研究中的机器学习方法:挑战与最佳实践

Machine learning approaches in microbiome research: challenges and best practices.

作者信息

Papoutsoglou Georgios, Tarazona Sonia, Lopes Marta B, Klammsteiner Thomas, Ibrahimi Eliana, Eckenberger Julia, Novielli Pierfrancesco, Tonda Alberto, Simeon Andrea, Shigdel Rajesh, Béreux Stéphane, Vitali Giacomo, Tangaro Sabina, Lahti Leo, Temko Andriy, Claesson Marcus J, Berland Magali

机构信息

Department of Computer Science, University of Crete, Heraklion, Greece.

JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece.

出版信息

Front Microbiol. 2023 Sep 22;14:1261889. doi: 10.3389/fmicb.2023.1261889. eCollection 2023.

DOI:10.3389/fmicb.2023.1261889
PMID:37808286
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10556866/
Abstract

Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.

摘要

机器学习(ML)工作流程中的微生物组数据预测分析存在众多特定领域的挑战,涉及预处理、特征选择、预测建模、性能评估、模型解释以及从结果中提取生物信息。为协助决策,我们基于COST行动ML4Microbiome提供了一组关于算法选择、流程创建和评估的建议。我们在结直肠癌患者的多队列鸟枪法宏基因组学数据集上比较了建议的方法,重点关注它们在疾病诊断和生物标志物发现方面的性能。结果表明,将成分转换和过滤方法用作数据预处理的一部分并不总能提高模型的预测性能。相比之下,多变量特征选择,如统计等效特征算法,在减少分类误差方面是有效的。在单独的测试数据集上进行验证时,该算法与随机森林建模相结合,提供了最准确的性能估计。最后,我们展示了如何通过逻辑回归进行线性建模,并结合个体条件期望(ICE)图等可视化技术得出可解释的结果并提供生物学见解。这些发现对临床医生和非专家在转化应用中都具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/0eb4dff7bb74/fmicb-14-1261889-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/fa821686759c/fmicb-14-1261889-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/17a407315954/fmicb-14-1261889-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/44b09ace4d46/fmicb-14-1261889-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/7a0957d32897/fmicb-14-1261889-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/613f006c4d2b/fmicb-14-1261889-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/0eb4dff7bb74/fmicb-14-1261889-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/fa821686759c/fmicb-14-1261889-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/17a407315954/fmicb-14-1261889-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/44b09ace4d46/fmicb-14-1261889-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/7a0957d32897/fmicb-14-1261889-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/613f006c4d2b/fmicb-14-1261889-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888f/10556866/0eb4dff7bb74/fmicb-14-1261889-g006.jpg

相似文献

1
Machine learning approaches in microbiome research: challenges and best practices.微生物组研究中的机器学习方法:挑战与最佳实践
Front Microbiol. 2023 Sep 22;14:1261889. doi: 10.3389/fmicb.2023.1261889. eCollection 2023.
2
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
3
Gene-based microbiome representation enhances host phenotype classification.基于基因的微生物组表示增强了宿主表型分类。
mSystems. 2023 Aug 31;8(4):e0053123. doi: 10.1128/msystems.00531-23. Epub 2023 Jul 5.
4
Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action.利用机器学习推进微生物组研究:ML4Microbiome COST行动的关键发现
Front Microbiol. 2023 Sep 25;14:1257002. doi: 10.3389/fmicb.2023.1257002. eCollection 2023.
5
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
6
Diagnosis of Inflammatory Bowel Disease and Colorectal Cancer through Multi-View Stacked Generalization Applied on Gut Microbiome Data.通过应用于肠道微生物组数据的多视图堆叠泛化诊断炎症性肠病和结直肠癌
Diagnostics (Basel). 2022 Oct 17;12(10):2514. doi: 10.3390/diagnostics12102514.
7
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
8
Just Add Data: automated predictive modeling for knowledge discovery and feature selection.只需添加数据:用于知识发现和特征选择的自动预测建模
NPJ Precis Oncol. 2022 Jun 16;6(1):38. doi: 10.1038/s41698-022-00274-8.
9
Interpretable and accurate prediction models for metagenomics data.可解释且准确的宏基因组学数据预测模型。
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa010.
10
Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods.基于不同特征选择方法筛选出的人类肠道微生物组炎症性肠病生物标志物。
PeerJ. 2022 Apr 25;10:e13205. doi: 10.7717/peerj.13205. eCollection 2022.

引用本文的文献

1
Personalized colorectal cancer risk assessment through explainable AI and Gut microbiome profiling.通过可解释的人工智能和肠道微生物群分析进行个性化结直肠癌风险评估。
Gut Microbes. 2025 Dec;17(1):2543124. doi: 10.1080/19490976.2025.2543124. Epub 2025 Aug 4.
2
Machine Learning Framework for Ovarian Cancer Diagnostics Using Plasma Lipidomics and Metabolomics.基于血浆脂质组学和代谢组学的卵巢癌诊断机器学习框架
Int J Mol Sci. 2025 Jul 10;26(14):6630. doi: 10.3390/ijms26146630.
3
Machine learning integrates region-specific microbial signatures to distinguish geographically adjacent populations within a province.

本文引用的文献

1
Alteration of Gut Microbiome in Patients With Schizophrenia Indicates Links Between Bacterial Tyrosine Biosynthesis and Cognitive Dysfunction.精神分裂症患者肠道微生物群的改变表明细菌酪氨酸生物合成与认知功能障碍之间存在联系。
Biol Psychiatry Glob Open Sci. 2022 Feb 10;3(2):283-291. doi: 10.1016/j.bpsgos.2022.01.009. eCollection 2023 Apr.
2
Cancer debugged.癌症已被消除。
Nat Biotechnol. 2023 Mar;41(3):310-313. doi: 10.1038/s41587-023-01677-z.
3
PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data.
机器学习整合特定区域的微生物特征,以区分一个省内地理上相邻的人群。
Front Microbiol. 2025 Jul 11;16:1586195. doi: 10.3389/fmicb.2025.1586195. eCollection 2025.
4
Sustainable Innovations in Food Microbiology: Fermentation, Biocontrol, and Functional Foods.食品微生物学中的可持续创新:发酵、生物防治与功能食品。
Foods. 2025 Jun 30;14(13):2320. doi: 10.3390/foods14132320.
5
Application of Predictive Modeling and Molecular Simulations to Elucidate the Mechanisms Underlying the Antimicrobial Activity of Sage ( L.) Components in Fresh Cheese Production.应用预测模型和分子模拟来阐明新鲜奶酪生产中鼠尾草(L.)成分抗菌活性的潜在机制。
Foods. 2025 Jun 20;14(13):2164. doi: 10.3390/foods14132164.
6
Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection.通过稳定特征选择确定微生物组-代谢组学整合的最佳机器学习方法。
bioRxiv. 2025 Jun 30:2025.06.21.660858. doi: 10.1101/2025.06.21.660858.
7
Development and validation of machine learning models for predicting blastocyst yield in IVF cycles.用于预测体外受精周期中囊胚产量的机器学习模型的开发与验证
Sci Rep. 2025 Jul 2;15(1):22631. doi: 10.1038/s41598-025-06998-4.
8
Exploring the gut microbiome's influence on cancer-associated anemia: Mechanisms, clinical challenges, and innovative therapies.探索肠道微生物群对癌症相关性贫血的影响:机制、临床挑战及创新疗法。
World J Gastrointest Pharmacol Ther. 2025 Jun 5;16(2):105375. doi: 10.4292/wjgpt.v16.i2.105375.
9
Differential intestinal microbiome response to heat stress in two rabbit maternal lines: a comparative analysis using Random Forest, BayesC, and PLS-DA.两个家兔母系中肠道微生物群对热应激的差异反应:使用随机森林、贝叶斯C和偏最小二乘判别分析的比较分析
J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf206.
10
Network-based representation learning reveals the impact of age and diet on the gut microbial and metabolomic environment of U.S. infants in a randomized controlled feeding trial.基于网络的表征学习揭示了在一项随机对照喂养试验中年龄和饮食对美国婴儿肠道微生物和代谢组学环境的影响。
bioRxiv. 2025 May 22:2024.11.01.621627. doi: 10.1101/2024.11.01.621627.
PLSDA-batch:一种用于校正微生物组数据中批次效应的多元框架。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbac622.
4
Worldwide impact of lifestyle predictors of dementia prevalence: An eXplainable Artificial Intelligence analysis.痴呆症患病率生活方式预测因素的全球影响:可解释人工智能分析
Front Big Data. 2022 Dec 8;5:1027783. doi: 10.3389/fdata.2022.1027783. eCollection 2022.
5
Enrichment of Prevotella intermedia in human colorectal cancer and its additive effects with Fusobacterium nucleatum on the malignant transformation of colorectal adenomas.拟杆菌中间普雷沃氏菌在人类结直肠癌中的富集及其与核梭杆菌的协同作用对结直肠腺瘤恶变的影响。
J Biomed Sci. 2022 Oct 27;29(1):88. doi: 10.1186/s12929-022-00869-0.
6
Batch effects removal for microbiome data via conditional quantile regression.通过条件分位数回归去除微生物组数据的批次效应。
Nat Commun. 2022 Sep 15;13(1):5418. doi: 10.1038/s41467-022-33071-9.
7
Benchmarking AutoML frameworks for disease prediction using medical claims.使用医疗理赔数据对用于疾病预测的自动化机器学习框架进行基准测试。
BioData Min. 2022 Jul 26;15(1):15. doi: 10.1186/s13040-022-00300-2.
8
A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer's Disease.一个用于研究轻度认知障碍和阿尔茨海默病可解释人工智能标志物的可靠性和稳定性的强大框架。
Brain Inform. 2022 Jul 26;9(1):17. doi: 10.1186/s40708-022-00165-5.
9
Both Disease Activity and HLA-B27 Status Are Associated With Gut Microbiome Dysbiosis in Spondyloarthritis Patients.疾病活动度和 HLA-B27 状态与脊柱关节炎患者的肠道微生物失调有关。
Arthritis Rheumatol. 2023 Jan;75(1):41-52. doi: 10.1002/art.42289. Epub 2022 Nov 19.
10
Just Add Data: automated predictive modeling for knowledge discovery and feature selection.只需添加数据:用于知识发现和特征选择的自动预测建模
NPJ Precis Oncol. 2022 Jun 16;6(1):38. doi: 10.1038/s41698-022-00274-8.