• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

代谢组学的自动化机器学习和可解释人工智能(AutoML-XAI):提高癌症诊断水平。

Automated Machine Learning and Explainable AI (AutoML-XAI) for Metabolomics: Improving Cancer Diagnostics.

机构信息

School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.

Petit Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.

出版信息

J Am Soc Mass Spectrom. 2024 Jun 5;35(6):1089-1100. doi: 10.1021/jasms.3c00403. Epub 2024 May 1.

DOI:10.1021/jasms.3c00403
PMID:38690775
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11157651/
Abstract

Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.

摘要

代谢组学生成的复杂数据需要先进的计算方法来产生生物学见解。虽然机器学习(ML)很有前途,但对于非专业人士来说,选择最佳算法和调整超参数的挑战仍然存在。自动化机器学习(AutoML)可以简化这个过程;然而,可解释性的问题可能仍然存在。本研究介绍了一个结合了 AutoML 和可解释 AI(XAI)技术的统一管道,以优化代谢组学分析。我们在两个数据集上测试了我们的方法:肾细胞癌(RCC)尿液代谢组学和卵巢癌(OC)血清代谢组学。使用 Auto-sklearn 的 AutoML 在区分 RCC 与健康对照以及 OC 患者与其他妇科癌症患者方面,超过了 SVM 和 k-Nearest Neighbors 等独立的 ML 算法。Auto-sklearn 的有效性通过其在看不见的测试集上为 RCC 和 OC 获得的 0.97 和 0.85 的 AUC 分数得到了突出。重要的是,在大多数考虑的指标上,Auto-sklearn 利用算法和集成技术的组合,展示了更好的分类性能。Shapley Additive Explanations (SHAP) 提供了特征重要性的全局排名,确定二丁胺和神经节苷脂 GM(d34:1) 分别为 RCC 和 OC 的顶级区分代谢物。瀑布图通过说明每个代谢物对单个预测的影响来提供局部解释。依赖图突出了代谢物相互作用,例如在 RCC 中, hippuric acid 与其衍生物之一之间的连接,以及在 OC 中 GM3(d34:1) 和 GM3(18:1_16:0) 之间的连接,暗示了潜在的机制关系。通过决策图,对特征重要性进行了详细的错误分析,比较了正确和错误分类样本的特征重要性。从本质上讲,我们的管道强调了协调 AutoML 和 XAI 的重要性,既简化了 ML 应用,又提高了代谢组学数据科学的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/c94f9faf0e46/js3c00403_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/931fc594146d/js3c00403_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/3e7e38a74e71/js3c00403_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/e9b1fc0e6c48/js3c00403_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/81ed6d9dcf25/js3c00403_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/96488bca1051/js3c00403_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/c94f9faf0e46/js3c00403_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/931fc594146d/js3c00403_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/3e7e38a74e71/js3c00403_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/e9b1fc0e6c48/js3c00403_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/81ed6d9dcf25/js3c00403_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/96488bca1051/js3c00403_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d23/11157651/c94f9faf0e46/js3c00403_0006.jpg

相似文献

1
Automated Machine Learning and Explainable AI (AutoML-XAI) for Metabolomics: Improving Cancer Diagnostics.代谢组学的自动化机器学习和可解释人工智能(AutoML-XAI):提高癌症诊断水平。
J Am Soc Mass Spectrom. 2024 Jun 5;35(6):1089-1100. doi: 10.1021/jasms.3c00403. Epub 2024 May 1.
2
Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics.用于代谢组学的自动化机器学习与可解释人工智能(AutoML-XAI):改善癌症诊断
bioRxiv. 2023 Oct 31:2023.10.26.564244. doi: 10.1101/2023.10.26.564244.
3
Metabolomics Biomarker Discovery to Optimize Hepatocellular Carcinoma Diagnosis: Methodology Integrating AutoML and Explainable Artificial Intelligence.用于优化肝细胞癌诊断的代谢组学生物标志物发现:整合自动机器学习和可解释人工智能的方法
Diagnostics (Basel). 2024 Sep 15;14(18):2049. doi: 10.3390/diagnostics14182049.
4
Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study.利用自动化机器学习预测 COVID-19 患者的死亡率:预测模型开发研究。
J Med Internet Res. 2021 Feb 26;23(2):e23458. doi: 10.2196/23458.
5
Diagnostics of ovarian cancer via metabolite analysis and machine learning.通过代谢物分析和机器学习诊断卵巢癌。
Integr Biol (Camb). 2023 Apr 11;15. doi: 10.1093/intbio/zyad005.
6
Urine metabolomics analysis for kidney cancer detection and biomarker discovery.用于肾癌检测和生物标志物发现的尿液代谢组学分析。
Mol Cell Proteomics. 2009 Mar;8(3):558-70. doi: 10.1074/mcp.M800165-MCP200. Epub 2008 Nov 13.
7
Automated machine learning for predicting liver metastasis in patients with gastrointestinal stromal tumor: a SEER-based analysis.基于 SEER 数据库的自动化机器学习预测胃肠道间质瘤患者肝转移的研究
Sci Rep. 2024 May 30;14(1):12415. doi: 10.1038/s41598-024-62311-9.
8
Value of global metabolomics in association with diagnosis and clinicopathological factors of renal cell carcinoma.全球代谢组学在与肾细胞癌的诊断和临床病理因素相关联中的价值。
Int J Cancer. 2019 Jul 15;145(2):484-493. doi: 10.1002/ijc.32115. Epub 2019 Jan 24.
9
Explainable coronary artery disease prediction model based on AutoGluon from AutoML framework.基于自动机器学习框架AutoGluon的可解释冠状动脉疾病预测模型
Front Cardiovasc Med. 2024 Jul 1;11:1360548. doi: 10.3389/fcvm.2024.1360548. eCollection 2024.
10
Enhancing prediction and analysis of UK road traffic accident severity using AI: Integration of machine learning, econometric techniques, and time series forecasting in public health research.利用人工智能增强英国道路交通事故严重程度的预测与分析:公共卫生研究中机器学习、计量经济学技术与时间序列预测的整合
Heliyon. 2024 Apr 2;10(7):e28547. doi: 10.1016/j.heliyon.2024.e28547. eCollection 2024 Apr 15.

引用本文的文献

1
Manual Delineation of the Region of Interest Combined With Clinical Image Analysis to Predict the Ki-67 Expression Level in Non-small Cell Lung Cancer.结合临床图像分析的感兴趣区域手动勾勒以预测非小细胞肺癌中的Ki-67表达水平
Sage Open Pathol. 2025 May 12;18:30502098251336608. doi: 10.1177/30502098251336608. eCollection 2025 Jan-Dec.
2
Proposed Comprehensive Methodology Integrated with Explainable Artificial Intelligence for Prediction of Possible Biomarkers in Metabolomics Panel of Plasma Samples for Breast Cancer Detection.结合可解释人工智能的拟议综合方法,用于预测血浆样本代谢组学面板中乳腺癌检测的潜在生物标志物。
Medicina (Kaunas). 2025 Mar 25;61(4):581. doi: 10.3390/medicina61040581.
3

本文引用的文献

1
Serum Lipidome Profiling Reveals a Distinct Signature of Ovarian Cancer in Korean Women.血清脂质组学分析揭示韩国女性卵巢癌的独特特征。
Cancer Epidemiol Biomarkers Prev. 2024 May 1;33(5):681-693. doi: 10.1158/1055-9965.EPI-23-1293.
2
Explainable AI for Bioinformatics: Methods, Tools and Applications.可解释人工智能在生物信息学中的应用:方法、工具与应用。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad236.
3
Applications of machine learning in metabolomics: Disease modeling and classification.机器学习在代谢组学中的应用:疾病建模与分类。
Untargeted Lipidomic Biomarkers for Liver Cancer Diagnosis: A Tree-Based Machine Learning Model Enhanced by Explainable Artificial Intelligence.
用于肝癌诊断的非靶向脂质组学生物标志物:一种由可解释人工智能增强的基于树的机器学习模型。
Medicina (Kaunas). 2025 Feb 26;61(3):405. doi: 10.3390/medicina61030405.
4
AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis.用于卵巢癌诊断的人工智能衍生血液生物标志物:系统评价与荟萃分析
J Med Internet Res. 2025 Mar 24;27:e67922. doi: 10.2196/67922.
5
Risk Prediction of Liver Injury in Pediatric Tuberculosis Treatment: Development of an Automated Machine Learning Model.儿童结核病治疗中肝损伤的风险预测:一种自动化机器学习模型的开发
Drug Des Devel Ther. 2025 Jan 13;19:239-250. doi: 10.2147/DDDT.S495555. eCollection 2025.
6
Liquid Biopsy-Based Detection and Response Prediction for Depression.基于液体活检的抑郁症检测和反应预测。
ACS Nano. 2024 Nov 26;18(47):32498-32507. doi: 10.1021/acsnano.4c08233. Epub 2024 Nov 5.
Front Genet. 2022 Nov 24;13:1017340. doi: 10.3389/fgene.2022.1017340. eCollection 2022.
4
Data analysis with Shapley values for automatic subject selection in Alzheimer's disease data sets using interpretable machine learning.使用可解释机器学习对阿尔茨海默病数据集进行 Shapley 值数据分析,以实现自动受试者选择。
Alzheimers Res Ther. 2021 Sep 15;13(1):155. doi: 10.1186/s13195-021-00879-4.
5
Explaining multivariate molecular diagnostic tests via Shapley values.通过 Shapley 值解释多变量分子诊断测试。
BMC Med Inform Decis Mak. 2021 Jul 8;21(1):211. doi: 10.1186/s12911-021-01569-9.
6
Machine Learning-Enabled Renal Cell Carcinoma Status Prediction Using Multiplatform Urine-Based Metabolomics.基于多平台尿液代谢组学的机器学习辅助肾细胞癌状态预测。
J Proteome Res. 2021 Jul 2;20(7):3629-3641. doi: 10.1021/acs.jproteome.1c00213. Epub 2021 Jun 23.
7
From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解
Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.
8
Machine Learning Applications for Mass Spectrometry-Based Metabolomics.基于质谱的代谢组学的机器学习应用
Metabolites. 2020 Jun 13;10(6):243. doi: 10.3390/metabo10060243.
9
Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.使用 Shapley 值解释机器学习模型:在化合物效力和多靶点活性预测中的应用。
J Comput Aided Mol Des. 2020 Oct;34(10):1013-1026. doi: 10.1007/s10822-020-00314-0. Epub 2020 May 2.
10
A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.八种机器学习算法在十个临床代谢组学数据集上进行二进制分类的广义预测能力的比较评估。
Metabolomics. 2019 Nov 15;15(12):150. doi: 10.1007/s11306-019-1612-4.