• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过稳定特征选择确定微生物组-代谢组学整合的最佳机器学习方法。

Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection.

作者信息

Palmer Suzette N, Mishra Animesh, Gan Shuheng, Liu Dajiang, Koh Andrew Y, Zhan Xiaowei

机构信息

Division of Hematology/Oncology, Department of Pediatrics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.

Department of Biomedical Engineering, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.

出版信息

bioRxiv. 2025 Jun 30:2025.06.21.660858. doi: 10.1101/2025.06.21.660858.

DOI:10.1101/2025.06.21.660858
PMID:40631202
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12236860/
Abstract

Microbiome research has been limited by methodological inconsistencies. Taxonomy-based profiling presents challenges such as data sparsity, variable taxonomic resolution, and the reliance on DNA-based profiling, which provides limited functional insight. Multi-omics integration has emerged as a promising approach to link microbiome composition with function. However, the lack of standardized methodologies and inconsistencies in machine learning strategies has hindered reproducibility. Additionally, while machine learning can be used to identify key microbial and metabolic features, the stability of feature selection across models and data types remains underexplored, despite its importance for downstream experimental validation and biomarker discovery. Here, we systematically compare Elastic Net, Random Forest, and XGBoost across five multi-omics integration strategies: Concatenation, Averaged Stacking, Weighted Non-negative Least Squares (NNLS), Lasso Stacking, and Partial Least Squares (PLS), as well as individual 'omics models. We evaluate performance across 588 binary and 735 continuous models using microbiome-derived metabolomics and taxonomic data. Additionally, we assess the impact of feature reduction on model performance and feature selection stability. Among the approaches tested, Random Forest combined with NNLS yielded the highest overall performance across diverse datasets. Tree-based methods also demonstrated consistent feature selection across data types and dimensionalities. These results demonstrate how integration strategies, algorithm selection, data dimensionality, and response type impact both predictive performance and the stability of selected features in multi-omics microbiome modeling.

摘要

微生物组研究一直受到方法不一致性的限制。基于分类学的分析存在诸多挑战,如数据稀疏性、可变的分类分辨率以及对基于DNA的分析的依赖,而这种分析提供的功能见解有限。多组学整合已成为一种将微生物组组成与功能联系起来的有前景的方法。然而,缺乏标准化方法以及机器学习策略的不一致性阻碍了可重复性。此外,虽然机器学习可用于识别关键的微生物和代谢特征,但跨模型和数据类型的特征选择稳定性仍未得到充分探索,尽管其对下游实验验证和生物标志物发现很重要。在此,我们系统地比较了弹性网络、随机森林和XGBoost在五种多组学整合策略上的表现:串联、平均堆叠、加权非负最小二乘法(NNLS)、套索堆叠和偏最小二乘法(PLS),以及单个“组学”模型。我们使用微生物组衍生的代谢组学和分类学数据评估了588个二元模型和735个连续模型的性能。此外,我们评估了特征约简对模型性能和特征选择稳定性的影响。在所测试的方法中,随机森林与NNLS相结合在不同数据集中产生了最高的整体性能。基于树的方法在不同数据类型和维度上也表现出一致的特征选择。这些结果表明了整合策略、算法选择、数据维度和响应类型如何影响多组学微生物组建模中的预测性能和所选特征的稳定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/add53aab8d81/nihpp-2025.06.21.660858v2-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/91a7bf9bf3c8/nihpp-2025.06.21.660858v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/9b8d243cd347/nihpp-2025.06.21.660858v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/c8cc330b767e/nihpp-2025.06.21.660858v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/24863dc66627/nihpp-2025.06.21.660858v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/e8f4a5d7be2f/nihpp-2025.06.21.660858v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/6c32a8282785/nihpp-2025.06.21.660858v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/58c14f59a34e/nihpp-2025.06.21.660858v2-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/add53aab8d81/nihpp-2025.06.21.660858v2-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/91a7bf9bf3c8/nihpp-2025.06.21.660858v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/9b8d243cd347/nihpp-2025.06.21.660858v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/c8cc330b767e/nihpp-2025.06.21.660858v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/24863dc66627/nihpp-2025.06.21.660858v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/e8f4a5d7be2f/nihpp-2025.06.21.660858v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/6c32a8282785/nihpp-2025.06.21.660858v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/58c14f59a34e/nihpp-2025.06.21.660858v2-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/12236860/add53aab8d81/nihpp-2025.06.21.660858v2-f0008.jpg

相似文献

1
Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection.通过稳定特征选择确定微生物组-代谢组学整合的最佳机器学习方法。
bioRxiv. 2025 Jun 30:2025.06.21.660858. doi: 10.1101/2025.06.21.660858.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Differential intestinal microbiome response to heat stress in two rabbit maternal lines: a comparative analysis using Random Forest, BayesC, and PLS-DA.两个家兔母系中肠道微生物群对热应激的差异反应:使用随机森林、贝叶斯C和偏最小二乘判别分析的比较分析
J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf206.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Integrating Gut Microbiome and Metabolomics with Magnetic Resonance Enterography to Advance Bowel Damage Prediction in Crohn's Disease.整合肠道微生物组和代谢组学与磁共振肠造影术以推进克罗恩病肠道损伤预测
J Inflamm Res. 2025 Jun 11;18:7631-7649. doi: 10.2147/JIR.S524671. eCollection 2025.
7
A novel double machine learning approach for detecting early breast cancer using advanced feature selection and dimensionality reduction techniques.一种使用先进特征选择和降维技术检测早期乳腺癌的新型双机器学习方法。
Sci Rep. 2025 Jul 2;15(1):22971. doi: 10.1038/s41598-025-06426-7.
8
Short-Term Memory Impairment短期记忆障碍
9
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
10
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

本文引用的文献

1
Bracken: estimating species abundance in metagenomics data.蕨类植物:宏基因组学数据中物种丰度的估计
PeerJ Comput Sci. 2017;3. doi: 10.7717/peerj-cs.104. Epub 2017 Jan 2.
2
Effects of data transformation and model selection on feature importance in microbiome classification data.数据转换和模型选择对微生物组分类数据中特征重要性的影响。
Microbiome. 2025 Jan 4;13(1):2. doi: 10.1186/s40168-024-01996-6.
3
Examining the healthy human microbiome concept.审视健康人类微生物组概念。
Nat Rev Microbiol. 2025 Mar;23(3):192-205. doi: 10.1038/s41579-024-01107-0. Epub 2024 Oct 23.
4
Complex heatmap visualization.复杂热图可视化。
Imeta. 2022 Aug 1;1(3):e43. doi: 10.1002/imt2.43. eCollection 2022 Sep.
5
A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions.基于机器学习应用的微生物组数据综合概述:分类、可及性及未来方向。
Front Microbiol. 2024 Feb 13;15:1343572. doi: 10.3389/fmicb.2024.1343572. eCollection 2024.
6
Multi-omic approaches for host-microbiome data integration.基于组学的宿主-微生物组数据整合方法。
Gut Microbes. 2024 Jan-Dec;16(1):2297860. doi: 10.1080/19490976.2023.2297860. Epub 2024 Jan 2.
7
An integrated Bayesian framework for multi-omics prediction and classification.一种用于多组学预测和分类的集成贝叶斯框架。
Stat Med. 2024 Feb 28;43(5):983-1002. doi: 10.1002/sim.9953. Epub 2023 Dec 26.
8
Machine learning and deep learning applications in microbiome research.机器学习与深度学习在微生物组研究中的应用。
ISME Commun. 2022 Oct 6;2(1):98. doi: 10.1038/s43705-022-00182-9.
9
Machine learning approaches in microbiome research: challenges and best practices.微生物组研究中的机器学习方法:挑战与最佳实践
Front Microbiol. 2023 Sep 22;14:1261889. doi: 10.3389/fmicb.2023.1261889. eCollection 2023.
10
Challenges and opportunities in sharing microbiome data and analyses.分享微生物组数据和分析的挑战与机遇。
Nat Microbiol. 2023 Nov;8(11):1960-1970. doi: 10.1038/s41564-023-01484-x. Epub 2023 Oct 2.