Orsini Nicola, Moore Alex, Wolk Alicja
Department of Global Public Health, Karolinska Institutet, Stockholm, Sweden.
Managed Self Ltd T/A Klarity, Bournemouth, United Kingdom.
Front Nutr. 2022 Jul 18;9:871768. doi: 10.3389/fnut.2022.871768. eCollection 2022.
SHapley Additive exPlanations (SHAP) based on tree-based machine learning methods have been proposed to interpret interactions between exposures in observational studies, but their performance in realistic simulations is seldom evaluated.
Data from population-based cohorts in Sweden of 47,770 men and women with complete baseline information on diet and lifestyles were used to inform a realistic simulation in 3 scenarios of small (OR = 0.75 vs. OR = 0.70), moderate (OR = 0.75 vs. OR = 0.65), and large (OR = 0.75 vs. OR = 0.60) discrepancies in the adjusted mortality odds ratios conferred by a healthy diet among men and among women. Estimates were obtained with logistic regression (L-OR L-OR) and derived from SHAP values (S-OR S-OR).
The sensitivities of detecting small, moderate, and large discrepancies were 28, 83, and 100%, respectively. The sensitivities of a positive sign (L-OR > L-OR) in the 3 scenarios were 93, 100, and 100%, respectively. Similarly, the sensitivities of a positive discrepancy based on SHAP values (S-OR > S-OR) were 86, 99, and 100%, respectively.
In a realistic simulation study, the ability of the SHAP values to detect an interaction effect was proportional to its magnitude. In contrast, the ability to identify the sign or direction of such interaction effect was very high in all the simulated scenarios.
基于树的机器学习方法的夏普利值(SHapley Additive exPlanations,SHAP)已被提出用于解释观察性研究中暴露因素之间的相互作用,但其在实际模拟中的性能很少得到评估。
来自瑞典基于人群队列的47770名男性和女性的数据,这些数据包含饮食和生活方式的完整基线信息,用于在3种情景下进行实际模拟,即男性和女性中健康饮食导致的调整后死亡比值比的小差异(OR = 0.75对OR = 0.70)、中等差异(OR = 0.75对OR = 0.65)和大差异(OR = 0.75对OR = 0.60)。通过逻辑回归(L-OR)获得估计值,并从SHAP值(S-OR)推导得出。
检测小、中、大差异的敏感度分别为28%、83%和100%。在这3种情景下,阳性符号(L-OR > L-OR)的敏感度分别为93%、100%和100%。同样,基于SHAP值的阳性差异(S-OR > S-OR)的敏感度分别为86%、99%和100%。
在实际模拟研究中,SHAP值检测相互作用效应的能力与其大小成正比。相比之下,在所有模拟情景中识别这种相互作用效应的符号或方向的能力非常高。