一种用于推断与算法无关的变量重要性的通用框架。

A general framework for inference on algorithm-agnostic variable importance.

作者信息

Williamson Brian D, Gilbert Peter B, Simon Noah R, Carone Marco

机构信息

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center.

Department of Biostatistics, University of Washington.

出版信息

J Am Stat Assoc. 2023;118(543):1645-1658. doi: 10.1080/01621459.2021.2003200. Epub 2022 Jan 5.

DOI:10.1080/01621459.2021.2003200

PMID:37982008

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10652709/

Abstract

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response - in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.

摘要

在许多应用中，评估特征（或特征子集）对预测响应目标的相对贡献是很有意义的——换句话说，就是衡量特征的变量重要性。最近关于变量重要性评估的工作大多集中在描述给定预测算法范围内特征的重要性。然而，这种评估不一定能表征特征的预测潜力，可能会对这些特征的内在价值提供误导性的反映。为了解决这一局限性，我们提出了一个用于可解释的与算法无关的变量重要性的非参数推断的通用框架。我们将变量重要性定义为所有可用特征的神谕预测性与除正在考虑的特征之外的所有特征的神谕预测性之间的总体水平对比。我们提出了一种非参数有效估计程序，即使在使用机器学习技术时也能构建有效的置信区间。我们还概述了一种检验零重要性假设的有效策略。通过模拟，我们表明我们的提议具有良好的操作特性，并用一项针对抗HIV-1感染抗体研究的数据说明了它的用法。

相似文献

A general framework for inference on algorithm-agnostic variable importance.

J Am Stat Assoc. 2023;118(543):1645-1658. doi: 10.1080/01621459.2021.2003200. Epub 2022 Jan 5.

Nonparametric variable importance assessment using machine learning techniques.

Biometrics. 2021 Mar;77(1):9-22. doi: 10.1111/biom.13392. Epub 2020 Dec 8.

Efficient nonparametric statistical inference on population feature importance using Shapley values.

Proc Mach Learn Res. 2020 Jul;119:10282-10291.

Flexible variable selection in the presence of missing data.

Int J Biostat. 2024 Feb 13;20(2):347-359. doi: 10.1515/ijb-2023-0059. eCollection 2024 Nov 1.

Importance of variables from different time frames for predicting self-harm using health system data.

medRxiv. 2024 Sep 20:2024.04.29.24306260. doi: 10.1101/2024.04.29.24306260.

Shapley variable importance cloud for interpretable machine learning.

Patterns (N Y). 2022 Feb 22;3(4):100452. doi: 10.1016/j.patter.2022.100452. eCollection 2022 Apr 8.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning-Based Development and Validation Study.

JMIR Med Inform. 2021 Nov 19;9(11):e30079. doi: 10.2196/30079.

Collaborative targeted maximum likelihood estimation for variable importance measure: Illustration for functional outcome prediction in mild traumatic brain injuries.

Stat Methods Med Res. 2018 Jan;27(1):286-297. doi: 10.1177/0962280215627335. Epub 2016 Jun 29.

Prediction-powered inference.

Science. 2023 Nov 10;382(6671):669-674. doi: 10.1126/science.adi6000. Epub 2023 Nov 9.

引用本文的文献

Application of machine learning in predicting perioperative neurocognitive disorders in elderly patients: the impact of sarcopenia-related features.

Front Med (Lausanne). 2025 Aug 18;12:1604333. doi: 10.3389/fmed.2025.1604333. eCollection 2025.

Valid and efficient inference for nonparametric variable importance in two-phase studies.

Biometrics. 2025 Jul 3;81(3). doi: 10.1093/biomtc/ujaf095.

Assessing variable importance in survival analysis using machine learning.

Biometrika. 2024 Nov 4;112(2):asae061. doi: 10.1093/biomet/asae061. eCollection 2025.

Non-Invasive Ventilation Failure in Pediatric ICU: A Machine Learning Driven Prediction.

Diagnostics (Basel). 2024 Dec 19;14(24):2857. doi: 10.3390/diagnostics14242857.

Advancing Regulatory Genomics With Machine Learning.

Bioinform Biol Insights. 2024 Dec 24;18:11779322241249562. doi: 10.1177/11779322241249562. eCollection 2024.

Out of (the) bag-encoding categorical predictors impacts out-of-bag samples.

PeerJ Comput Sci. 2024 Nov 18;10:e2445. doi: 10.7717/peerj-cs.2445. eCollection 2024.

Importance of variables from different time frames for predicting self-harm using health system data.

J Biomed Inform. 2024 Dec;160:104750. doi: 10.1016/j.jbi.2024.104750. Epub 2024 Nov 16.

Importance of variables from different time frames for predicting self-harm using health system data.

medRxiv. 2024 Sep 20:2024.04.29.24306260. doi: 10.1101/2024.04.29.24306260.

Algorithm-agnostic significance testing in supervised learning with multimodal data.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae475.

Predicting neutralization susceptibility to combination HIV-1 monoclonal broadly neutralizing antibody regimens.

PLoS One. 2024 Sep 6;19(9):e0310042. doi: 10.1371/journal.pone.0310042. eCollection 2024.

本文引用的文献

Efficient nonparametric statistical inference on population feature importance using Shapley values.

Proc Mach Learn Res. 2020 Jul;119:10282-10291.

Two Randomized Trials of Neutralizing Antibodies to Prevent HIV-1 Acquisition.

N Engl J Med. 2021 Mar 18;384(11):1003-1014. doi: 10.1056/NEJMoa2031738.

Nonparametric variable importance assessment using machine learning techniques.

Biometrics. 2021 Mar;77(1):9-22. doi: 10.1111/biom.13392. Epub 2020 Dec 8.

A machine learning-based approach for estimating and testing associations with multivariate outcomes.

Int J Biostat. 2020 Aug 13;17(1):7-21. doi: 10.1515/ijb-2019-0061.

Definitions, methods, and applications in interpretable machine learning.

Proc Natl Acad Sci U S A. 2019 Oct 29;116(44):22071-22080. doi: 10.1073/pnas.1900654116. Epub 2019 Oct 16.

Prediction of VRC01 neutralization sensitivity by HIV-1 gp160 sequence features.

PLoS Comput Biol. 2019 Apr 1;15(4):e1006952. doi: 10.1371/journal.pcbi.1006952. eCollection 2019 Apr.

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates.

Electron J Stat. 2015;9(1):1583-1607. doi: 10.1214/15-EJS1035.

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

PLoS One. 2015 Jul 10;10(7):e0130140. doi: 10.1371/journal.pone.0130140. eCollection 2015.

Measuring HIV neutralization in a luciferase reporter gene assay.

Methods Mol Biol. 2009;485:395-405. doi: 10.1007/978-1-59745-170-3_26.

Super learner.

Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. Epub 2007 Sep 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于推断与算法无关的变量重要性的通用框架。

A general framework for inference on algorithm-agnostic variable importance.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献