• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用夏普利值对总体特征重要性进行高效非参数统计推断。

Efficient nonparametric statistical inference on population feature importance using Shapley values.

作者信息

Williamson Brian D, Feng Jean

机构信息

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA.

Department of Biostatistics, University of Washington, Seattle, WA.

出版信息

Proc Mach Learn Res. 2020 Jul;119:10282-10291.

PMID:33884372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8057672/
Abstract

The true population-level importance of a variable in a prediction task provides useful knowledge about the underlying data-generating mechanism and can help in deciding which measurements to collect in subsequent experiments. Valid statistical inference on this importance is a key component in understanding the population of interest. We present a computationally efficient procedure for estimating and obtaining valid statistical inference on the hapley opulation ariable mportance easure (SPVIM). Although the computational complexity of the true SPVIM scales exponentially with the number of variables, we propose an estimator based on randomly sampling only Θ() feature subsets given observations. We prove that our estimator converges at an asymptotically optimal rate. Moreover, by deriving the asymptotic distribution of our estimator, we construct valid confidence intervals and hypothesis tests. Our procedure has good finite-sample performance in simulations, and for an in-hospital mortality prediction task produces similar variable importance estimates when different machine learning algorithms are applied.

摘要

在预测任务中,变量在总体层面的真正重要性提供了有关潜在数据生成机制的有用知识,并有助于决定在后续实验中收集哪些测量数据。对这种重要性进行有效的统计推断是理解目标总体的关键组成部分。我们提出了一种计算效率高的程序,用于估计哈普利总体变量重要性度量(SPVIM)并获得有效的统计推断。尽管真实SPVIM的计算复杂度随变量数量呈指数增长,但我们提出了一种估计器,在给定观测值的情况下,仅对Θ()个特征子集进行随机采样。我们证明我们的估计器以渐近最优速率收敛。此外,通过推导我们估计器的渐近分布,我们构建了有效的置信区间和假设检验。我们的程序在模拟中具有良好的有限样本性能,并且对于院内死亡率预测任务,当应用不同的机器学习算法时会产生相似的变量重要性估计。

相似文献

1
Efficient nonparametric statistical inference on population feature importance using Shapley values.使用夏普利值对总体特征重要性进行高效非参数统计推断。
Proc Mach Learn Res. 2020 Jul;119:10282-10291.
2
Nonparametric variable importance assessment using machine learning techniques.基于机器学习技术的非参数变量重要性评估。
Biometrics. 2021 Mar;77(1):9-22. doi: 10.1111/biom.13392. Epub 2020 Dec 8.
3
A general framework for inference on algorithm-agnostic variable importance.一种用于推断与算法无关的变量重要性的通用框架。
J Am Stat Assoc. 2023;118(543):1645-1658. doi: 10.1080/01621459.2021.2003200. Epub 2022 Jan 5.
4
Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计
Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.
5
A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.一种基于高度自适应套索的一般有效基于靶向最小损失的估计器。
Int J Biostat. 2017 Oct 12;13(2):/j/ijb.2017.13.issue-2/ijb-2015-0097/ijb-2015-0097.xml. doi: 10.1515/ijb-2015-0097.
6
Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors.基于双尺度分布最近邻的最优非参数推断
J Am Stat Assoc. 2024;119(545):297-307. doi: 10.1080/01621459.2022.2115375. Epub 2022 Oct 5.
7
Shapley variable importance cloud for interpretable machine learning.用于可解释机器学习的Shapley变量重要性云图
Patterns (N Y). 2022 Feb 22;3(4):100452. doi: 10.1016/j.patter.2022.100452. eCollection 2022 Apr 8.
8
Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (LASSO) estimator.针对目标高度自适应最小绝对收缩与选择算子(LASSO)估计量的非参数自助推断
Int J Biostat. 2020 Aug 10. doi: 10.1515/ijb-2017-0070.
9
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.随机森林回归、分类和生存中变量重要性的标准误差和置信区间。
Stat Med. 2019 Feb 20;38(4):558-582. doi: 10.1002/sim.7803. Epub 2018 Jun 4.
10

引用本文的文献

1
Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner: Cohort Study.利用超级学习者预测全国 COVID 队列协作中的长新冠:队列研究。
JMIR Public Health Surveill. 2024 Aug 15;10:e53322. doi: 10.2196/53322.
2
Predicting risk of preterm birth in singleton pregnancies using machine learning algorithms.使用机器学习算法预测单胎妊娠早产风险。
Front Big Data. 2024 Feb 29;7:1291196. doi: 10.3389/fdata.2024.1291196. eCollection 2024.
3
Flexible variable selection in the presence of missing data.存在缺失数据时的灵活变量选择。
Int J Biostat. 2024 Feb 13;20(2):347-359. doi: 10.1515/ijb-2023-0059. eCollection 2024 Nov 1.
4
A general framework for inference on algorithm-agnostic variable importance.一种用于推断与算法无关的变量重要性的通用框架。
J Am Stat Assoc. 2023;118(543):1645-1658. doi: 10.1080/01621459.2021.2003200. Epub 2022 Jan 5.
5
Prediction of opioid-related outcomes in a medicaid surgical population: Evidence to guide postoperative opiate therapy and monitoring.医疗补助手术人群中阿片类药物相关结局的预测:指导术后阿片类药物治疗和监测的证据。
PLoS Comput Biol. 2023 Aug 14;19(8):e1011376. doi: 10.1371/journal.pcbi.1011376. eCollection 2023 Aug.
6
Machine Learning-Based Diagnosis and Ranking of Risk Factors for Diabetic Retinopathy in Population-Based Studies from South India.基于机器学习的印度南部人群研究中糖尿病视网膜病变风险因素的诊断与排序
Diagnostics (Basel). 2023 Jun 16;13(12):2084. doi: 10.3390/diagnostics13122084.
7
Explaining a series of models by propagating Shapley values.通过传播 Shapley 值来解释一系列模型。
Nat Commun. 2022 Aug 3;13(1):4512. doi: 10.1038/s41467-022-31384-3.
8
Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare.临床人工智能质量改进:迈向医疗保健中人工智能算法的持续监测与更新
NPJ Digit Med. 2022 May 31;5(1):66. doi: 10.1038/s41746-022-00611-y.
9
Marginal Contribution Feature Importance - an Axiomatic Approach for Explaining Data.边际贡献特征重要性——一种解释数据的公理方法。
Proc Mach Learn Res. 2021 Jul;139:1324-1335.

本文引用的文献

1
A general framework for inference on algorithm-agnostic variable importance.一种用于推断与算法无关的变量重要性的通用框架。
J Am Stat Assoc. 2023;118(543):1645-1658. doi: 10.1080/01621459.2021.2003200. Epub 2022 Jan 5.
2
Nonparametric variable importance assessment using machine learning techniques.基于机器学习技术的非参数变量重要性评估。
Biometrics. 2021 Mar;77(1):9-22. doi: 10.1111/biom.13392. Epub 2020 Dec 8.
3
From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解
Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.
4
Definitions, methods, and applications in interpretable machine learning.可解释机器学习中的定义、方法和应用。
Proc Natl Acad Sci U S A. 2019 Oct 29;116(44):22071-22080. doi: 10.1073/pnas.1900654116. Epub 2019 Oct 16.
5
Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012.预测重症监护病房患者的院内死亡率:2012年生理网/心脏病学计算挑战赛
Comput Cardiol (2010). 2012;39:245-248.
6
A model for immunological correlates of protection.一种保护的免疫相关模型。
Stat Med. 2006 May 15;25(9):1485-97. doi: 10.1002/sim.2282.