所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

作者信息

Fisher Aaron, Rudin Cynthia, Dominici Francesca

机构信息

Takeda Pharmaceuticals, Cambridge, MA 02139, USA.

Departments of Computer Science and Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA.

出版信息

J Mach Learn Res. 2019;20.

PMID:34335110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8323609/

Abstract

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model (x) = x with a fixed coefficient vector ) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

摘要

变量重要性（VI）工具描述了协变量对预测模型准确性的贡献程度。然而，对于一个性能良好的模型（例如，具有固定系数向量的线性模型 (x) = x ）而言重要的变量，对于另一个模型可能并不重要。在本文中，我们提出模型类依赖（MCR），将其作为预定义类中所有性能良好的模型的VI值范围。因此，MCR通过考虑到许多可能具有不同参数形式的预测模型都可能很好地拟合数据这一事实，对重要性给出了更全面的描述。在推导MCR的过程中，基于随机森林中使用的VI度量，我们展示了一些关于基于排列的VI估计的有用结果。具体而言，我们推导了预测模型的排列重要性估计、U统计量、条件变量重要性、条件因果效应和线性模型系数之间的联系。然后，我们使用一种新颖的、可推广的技术给出了MCR的概率界。我们将MCR应用于布劳沃德县犯罪记录的公共数据集，以研究累犯预测模型对性别和种族的依赖。在这个应用中，MCR可用于为未知的专有模型提供VI信息。

相似文献

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

J Mach Learn Res. 2019;20.

Exploring the variable importance in random forests under correlations: a general concept applied to donor organ quality in post-transplant survival.

BMC Med Res Methodol. 2023 Sep 19;23(1):209. doi: 10.1186/s12874-023-02023-2.

Accelerated and Interpretable Oblique Random Survival Forests.

J Comput Graph Stat. 2024;33(1):192-207. doi: 10.1080/10618600.2023.2231048.

A comparative study of forest methods for time-to-event data: variable selection and predictive performance.

BMC Med Res Methodol. 2021 Sep 25;21(1):193. doi: 10.1186/s12874-021-01386-8.

Estimation of a predictor's importance by Random Forests when there is missing data: risk prediction in liver surgery using laboratory data.

Int J Biostat. 2014;10(2):165-83. doi: 10.1515/ijb-2013-0038.

Helicopter Simulator Performance Prediction Using the Random Forest Method.

Aerosp Med Hum Perform. 2018 Nov 1;89(11):967-975. doi: 10.3357/AMHP.5086.2018.

Study becomes insight: Ecological learning from machine learning.

Methods Ecol Evol. 2021 Nov;12(11):2117-2128. doi: 10.1111/2041-210X.13686. Epub 2021 Aug 6.

Permutation importance: a corrected feature importance measure.

Bioinformatics. 2010 May 15;26(10):1340-7. doi: 10.1093/bioinformatics/btq134. Epub 2010 Apr 12.

Unbiased split variable selection for random survival forests using maximally selected rank statistics.

Stat Med. 2017 Apr 15;36(8):1272-1284. doi: 10.1002/sim.7212. Epub 2017 Jan 15.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

引用本文的文献

Using preprocessed datasets to construct and interpret multiclass identification models.

Front Plant Sci. 2025 Aug 20;16:1597673. doi: 10.3389/fpls.2025.1597673. eCollection 2025.

Virtual reality navigation for the early detection of Alzheimer's disease.

Front Aging Neurosci. 2025 Aug 20;17:1571429. doi: 10.3389/fnagi.2025.1571429. eCollection 2025.

AI-Driven Tacrolimus Dosing in Transplant Care: Cohort Study.

JMIR AI. 2025 Sep 2;4:e67302. doi: 10.2196/67302.

Disentangling soybean GxE effects in an integrated genomic prediction and machine learning-GWAS workflow.

Plant Methods. 2025 Aug 25;21(1):119. doi: 10.1186/s13007-025-01434-0.

Network analysis reveals causal relationships among individual background risk factors leading to influenza susceptibility.

Sci Rep. 2025 Aug 21;15(1):30721. doi: 10.1038/s41598-025-15131-4.

Applications of interpretable deep learning in neuroimaging: A comprehensive review.

Imaging Neurosci (Camb). 2024 Jul 12;2. doi: 10.1162/imag_a_00214. eCollection 2024.

Comparing variable and feature selection strategies for prediction - protocol of a simulation study in low-dimensional transplantation data.

PLoS One. 2025 Aug 1;20(8):e0328696. doi: 10.1371/journal.pone.0328696. eCollection 2025.

A natural language processing approach to support biomedical data harmonization: Leveraging large language models.

PLoS One. 2025 Jul 24;20(7):e0328262. doi: 10.1371/journal.pone.0328262. eCollection 2025.

JASMINE: A powerful representation learning method for enhanced analysis of incomplete multi-omics data.

bioRxiv. 2025 Jun 22:2025.06.16.659949. doi: 10.1101/2025.06.16.659949.

Neurons throughout the brain embed robust signatures of their anatomical location into spike trains.

Elife. 2025 Jun 27;13:RP101506. doi: 10.7554/eLife.101506.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Nonparametric variable importance assessment using machine learning techniques.

Biometrics. 2021 Mar;77(1):9-22. doi: 10.1111/biom.13392. Epub 2020 Dec 8.

Fair Inference on Outcomes.

Proc AAAI Conf Artif Intell. 2018 Feb;2018:1931-1940. Epub 2018 Apr 25.

Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.

Big Data. 2017 Jun;5(2):153-163. doi: 10.1089/big.2016.0047.

Prediction uncertainty and optimal experimental design for learning dynamical systems.

Chaos. 2016 Jun;26(6):063110. doi: 10.1063/1.4953795.

Reinforcement Learning Trees.

J Am Stat Assoc. 2015;110(512):1770-1784. doi: 10.1080/01621459.2015.1036994. Epub 2015 Apr 16.

An experimental study of the intrinsic stability of random forest variable importance measures.

BMC Bioinformatics. 2016 Feb 3;17:60. doi: 10.1186/s12859-016-0900-5.

Risk Assessment in Criminal Sentencing.

Annu Rev Clin Psychol. 2016;12:489-513. doi: 10.1146/annurev-clinpsy-021815-092945. Epub 2015 Dec 11.

Variable importance and prediction methods for longitudinal problems with missing variables.

PLoS One. 2015 Mar 27;10(3):e0120031. doi: 10.1371/journal.pone.0120031. eCollection 2015.

Algorithms for Discovery of Multiple Markov Boundaries.

J Mach Learn Res. 2013 Feb;14:499-566.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

作者信息

Fisher Aaron, Rudin Cynthia, Dominici Francesca

机构信息

Takeda Pharmaceuticals, Cambridge, MA 02139, USA.

Departments of Computer Science and Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA.

出版信息

J Mach Learn Res. 2019;20.

PMID:34335110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8323609/

Abstract

摘要

所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

作者信息

机构信息

出版信息