基于沙普利值和极端梯度提升的交互作用分析：大型流行病学前瞻性研究的现实模拟与应用

Interaction Analysis Based on Shapley Values and Extreme Gradient Boosting: A Realistic Simulation and Application to a Large Epidemiological Prospective Study.

作者信息

Orsini Nicola, Moore Alex, Wolk Alicja

机构信息

Department of Global Public Health, Karolinska Institutet, Stockholm, Sweden.

Managed Self Ltd T/A Klarity, Bournemouth, United Kingdom.

出版信息

Front Nutr. 2022 Jul 18;9:871768. doi: 10.3389/fnut.2022.871768. eCollection 2022.

DOI:10.3389/fnut.2022.871768

PMID:35923201

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9340268/

Abstract

BACKGROUND

SHapley Additive exPlanations (SHAP) based on tree-based machine learning methods have been proposed to interpret interactions between exposures in observational studies, but their performance in realistic simulations is seldom evaluated.

METHODS

Data from population-based cohorts in Sweden of 47,770 men and women with complete baseline information on diet and lifestyles were used to inform a realistic simulation in 3 scenarios of small (OR = 0.75 vs. OR = 0.70), moderate (OR = 0.75 vs. OR = 0.65), and large (OR = 0.75 vs. OR = 0.60) discrepancies in the adjusted mortality odds ratios conferred by a healthy diet among men and among women. Estimates were obtained with logistic regression (L-OR L-OR) and derived from SHAP values (S-OR S-OR).

RESULTS

The sensitivities of detecting small, moderate, and large discrepancies were 28, 83, and 100%, respectively. The sensitivities of a positive sign (L-OR > L-OR) in the 3 scenarios were 93, 100, and 100%, respectively. Similarly, the sensitivities of a positive discrepancy based on SHAP values (S-OR > S-OR) were 86, 99, and 100%, respectively.

CONCLUSIONS

In a realistic simulation study, the ability of the SHAP values to detect an interaction effect was proportional to its magnitude. In contrast, the ability to identify the sign or direction of such interaction effect was very high in all the simulated scenarios.

摘要

背景

基于树的机器学习方法的夏普利值（SHapley Additive exPlanations，SHAP）已被提出用于解释观察性研究中暴露因素之间的相互作用，但其在实际模拟中的性能很少得到评估。

方法

来自瑞典基于人群队列的47770名男性和女性的数据，这些数据包含饮食和生活方式的完整基线信息，用于在3种情景下进行实际模拟，即男性和女性中健康饮食导致的调整后死亡比值比的小差异（OR = 0.75对OR = 0.70）、中等差异（OR = 0.75对OR = 0.65）和大差异（OR = 0.75对OR = 0.60）。通过逻辑回归（L-OR）获得估计值，并从SHAP值（S-OR）推导得出。

结果

检测小、中、大差异的敏感度分别为28%、83%和100%。在这3种情景下，阳性符号（L-OR > L-OR）的敏感度分别为93%、100%和100%。同样，基于SHAP值的阳性差异（S-OR > S-OR）的敏感度分别为86%、99%和100%。

结论

在实际模拟研究中，SHAP值检测相互作用效应的能力与其大小成正比。相比之下，在所有模拟情景中识别这种相互作用效应的符号或方向的能力非常高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d398/9340268/02e289fecb4d/fnut-09-871768-g0001.jpg

相似文献

Interaction Analysis Based on Shapley Values and Extreme Gradient Boosting: A Realistic Simulation and Application to a Large Epidemiological Prospective Study.基于沙普利值和极端梯度提升的交互作用分析：大型流行病学前瞻性研究的现实模拟与应用

Front Nutr. 2022 Jul 18;9:871768. doi: 10.3389/fnut.2022.871768. eCollection 2022.

Explainable machine learning model to predict refeeding hypophosphatemia.解释性机器学习模型预测再喂养性低磷血症。

Clin Nutr ESPEN. 2021 Oct;45:213-219. doi: 10.1016/j.clnesp.2021.08.022. Epub 2021 Sep 10.

Comparison of Four Machine Learning Techniques for Prediction of Intensive Care Unit Length of Stay in Heart Transplantation Patients.四种机器学习技术用于预测心脏移植患者重症监护病房住院时长的比较

Front Cardiovasc Med. 2022 Jun 21;9:863642. doi: 10.3389/fcvm.2022.863642. eCollection 2022.

Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations.基于提升集成学习模型和 SHAPley 可加解释的道路交通事故严重程度预测与分析。

Int J Environ Res Public Health. 2022 Mar 2;19(5):2925. doi: 10.3390/ijerph19052925.

Explanation of machine learning models using shapley additive explanation and application for real data in hospital.使用 Shapley 加法解释对机器学习模型进行解释，并将其应用于医院的真实数据。

Comput Methods Programs Biomed. 2022 Feb;214:106584. doi: 10.1016/j.cmpb.2021.106584. Epub 2021 Dec 10.

Sensitivity analysis of slope stability based on eXtreme gradient boosting and SHapley Additive exPlanations: An exploratory study.基于极端梯度提升和SHapley值加法解释的边坡稳定性敏感性分析：一项探索性研究。

Heliyon. 2024 Aug 6;10(16):e35871. doi: 10.1016/j.heliyon.2024.e35871. eCollection 2024 Aug 30.

Explaining multivariate molecular diagnostic tests via Shapley values.通过 Shapley 值解释多变量分子诊断测试。

BMC Med Inform Decis Mak. 2021 Jul 8;21(1):211. doi: 10.1186/s12911-021-01569-9.

An analysis of Koreans' attitudes towards migrants by application of algorithmic approaches.运用算法方法对韩国人对移民的态度进行分析。

Heliyon. 2022 Aug 12;8(8):e10087. doi: 10.1016/j.heliyon.2022.e10087. eCollection 2022 Aug.

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.使用 Shapley 值解释机器学习模型：在化合物效力和多靶点活性预测中的应用。

J Comput Aided Mol Des. 2020 Oct;34(10):1013-1026. doi: 10.1007/s10822-020-00314-0. Epub 2020 May 2.

Predictors of 30-Day Mortality Among Dutch Patients Undergoing Colorectal Cancer Surgery, 2011-2016.2011-2016 年荷兰结直肠癌手术患者 30 天死亡率的预测因素。

JAMA Netw Open. 2021 Apr 1;4(4):e217737. doi: 10.1001/jamanetworkopen.2021.7737.

引用本文的文献

Perspective: Global Burden of Iodine Deficiency: Insights and Projections to 2050 Using XGBoost and SHAP.观点：碘缺乏的全球负担：使用XGBoost和SHAP对2050年的见解与预测

Adv Nutr. 2025 Mar;16(3):100384. doi: 10.1016/j.advnut.2025.100384. Epub 2025 Feb 4.

XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study.基于XGBoost-SHAP的膝骨关节炎可解释诊断框架：一项基于人群的回顾性队列研究

Arthritis Res Ther. 2024 Dec 19;26(1):213. doi: 10.1186/s13075-024-03450-2.

Factors predicting access to medications for opioid use disorder for housed and unhoused patients: A machine learning approach.预测有房和无房患者获得阿片类药物使用障碍治疗药物的因素：一种机器学习方法。

PLoS One. 2024 Sep 27;19(9):e0308791. doi: 10.1371/journal.pone.0308791. eCollection 2024.

Racial Disparities in Invasive ICU Treatments Among Septic Patients: High Resolution Electronic Health Records Analysis from MIMIC-IV.脓毒症患者 ICU 侵入性治疗中的种族差异：来自 MIMIC-IV 的高分辨率电子健康记录分析。

Yale J Biol Med. 2023 Sep 29;96(3):293-312. doi: 10.59249/WDJI8829. eCollection 2023 Sep.

Time series analyses based on the joint lagged effect analysis of pollution and meteorological factors of hemorrhagic fever with renal syndrome and the construction of prediction model.基于污染与气象因素的滞后联合效应分析及预测模型构建的肾综合征出血热时间序列研究。

PLoS Negl Trop Dis. 2023 Jul 24;17(7):e0010806. doi: 10.1371/journal.pntd.0010806. eCollection 2023 Jul.

本文引用的文献

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival.可解释的机器学习可以优于 Cox 回归预测，并提供乳腺癌生存的见解。

Sci Rep. 2021 Mar 26;11(1):6968. doi: 10.1038/s41598-021-86327-7.

Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival.基于机器学习的前列腺癌生存中非线性交互的解释与可视化。

JCO Clin Cancer Inform. 2020 Jul;4:637-646. doi: 10.1200/CCI.20.00002.

From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

The Swedish personal identity number: possibilities and pitfalls in healthcare and medical research.瑞典个人身份证号码：在医疗保健和医学研究中的可能性和陷阱。

Eur J Epidemiol. 2009;24(11):659-67. doi: 10.1007/s10654-009-9350-y. Epub 2009 Jun 6.

Diet quality and mortality: a population-based prospective study of men.饮食质量与死亡率：一项基于人群的男性前瞻性研究。

Eur J Clin Nutr. 2009 Apr;63(4):451-7. doi: 10.1038/sj.ejcn.1602968. Epub 2007 Dec 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于沙普利值和极端梯度提升的交互作用分析：大型流行病学前瞻性研究的现实模拟与应用

Interaction Analysis Based on Shapley Values and Extreme Gradient Boosting: A Realistic Simulation and Application to a Large Epidemiological Prospective Study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献