Suppr超能文献

基于沙普利值和极端梯度提升的交互作用分析:大型流行病学前瞻性研究的现实模拟与应用

Interaction Analysis Based on Shapley Values and Extreme Gradient Boosting: A Realistic Simulation and Application to a Large Epidemiological Prospective Study.

作者信息

Orsini Nicola, Moore Alex, Wolk Alicja

机构信息

Department of Global Public Health, Karolinska Institutet, Stockholm, Sweden.

Managed Self Ltd T/A Klarity, Bournemouth, United Kingdom.

出版信息

Front Nutr. 2022 Jul 18;9:871768. doi: 10.3389/fnut.2022.871768. eCollection 2022.

Abstract

BACKGROUND

SHapley Additive exPlanations (SHAP) based on tree-based machine learning methods have been proposed to interpret interactions between exposures in observational studies, but their performance in realistic simulations is seldom evaluated.

METHODS

Data from population-based cohorts in Sweden of 47,770 men and women with complete baseline information on diet and lifestyles were used to inform a realistic simulation in 3 scenarios of small (OR = 0.75 vs. OR = 0.70), moderate (OR = 0.75 vs. OR = 0.65), and large (OR = 0.75 vs. OR = 0.60) discrepancies in the adjusted mortality odds ratios conferred by a healthy diet among men and among women. Estimates were obtained with logistic regression (L-OR L-OR) and derived from SHAP values (S-OR S-OR).

RESULTS

The sensitivities of detecting small, moderate, and large discrepancies were 28, 83, and 100%, respectively. The sensitivities of a positive sign (L-OR > L-OR) in the 3 scenarios were 93, 100, and 100%, respectively. Similarly, the sensitivities of a positive discrepancy based on SHAP values (S-OR > S-OR) were 86, 99, and 100%, respectively.

CONCLUSIONS

In a realistic simulation study, the ability of the SHAP values to detect an interaction effect was proportional to its magnitude. In contrast, the ability to identify the sign or direction of such interaction effect was very high in all the simulated scenarios.

摘要

背景

基于树的机器学习方法的夏普利值(SHapley Additive exPlanations,SHAP)已被提出用于解释观察性研究中暴露因素之间的相互作用,但其在实际模拟中的性能很少得到评估。

方法

来自瑞典基于人群队列的47770名男性和女性的数据,这些数据包含饮食和生活方式的完整基线信息,用于在3种情景下进行实际模拟,即男性和女性中健康饮食导致的调整后死亡比值比的小差异(OR = 0.75对OR = 0.70)、中等差异(OR = 0.75对OR = 0.65)和大差异(OR = 0.75对OR = 0.60)。通过逻辑回归(L-OR)获得估计值,并从SHAP值(S-OR)推导得出。

结果

检测小、中、大差异的敏感度分别为28%、83%和100%。在这3种情景下,阳性符号(L-OR > L-OR)的敏感度分别为93%、100%和100%。同样,基于SHAP值的阳性差异(S-OR > S-OR)的敏感度分别为86%、99%和100%。

结论

在实际模拟研究中,SHAP值检测相互作用效应的能力与其大小成正比。相比之下,在所有模拟情景中识别这种相互作用效应的符号或方向的能力非常高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d398/9340268/02e289fecb4d/fnut-09-871768-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验