所有模型都是有缺陷的,但都是有用的:通过同时研究一整个类别的预测模型来了解变量的重要性。
All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.
作者信息
Fisher Aaron, Rudin Cynthia, Dominici Francesca
机构信息
Takeda Pharmaceuticals, Cambridge, MA 02139, USA.
Departments of Computer Science and Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA.
出版信息
J Mach Learn Res. 2019;20.
Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model (x) = x with a fixed coefficient vector ) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.
变量重要性(VI)工具描述了协变量对预测模型准确性的贡献程度。然而,对于一个性能良好的模型(例如,具有固定系数向量的线性模型 (x) = x )而言重要的变量,对于另一个模型可能并不重要。在本文中,我们提出模型类依赖(MCR),将其作为预定义类中所有性能良好的模型的VI值范围。因此,MCR通过考虑到许多可能具有不同参数形式的预测模型都可能很好地拟合数据这一事实,对重要性给出了更全面的描述。在推导MCR的过程中,基于随机森林中使用的VI度量,我们展示了一些关于基于排列的VI估计的有用结果。具体而言,我们推导了预测模型的排列重要性估计、U统计量、条件变量重要性、条件因果效应和线性模型系数之间的联系。然后,我们使用一种新颖的、可推广的技术给出了MCR的概率界。我们将MCR应用于布劳沃德县犯罪记录的公共数据集,以研究累犯预测模型对性别和种族的依赖。在这个应用中,MCR可用于为未知的专有模型提供VI信息。
相似文献
J Comput Graph Stat. 2024
BMC Med Res Methodol. 2021-9-25
Aerosp Med Hum Perform. 2018-11-1
Methods Ecol Evol. 2021-11
Bioinformatics. 2010-4-12
引用本文的文献
Front Plant Sci. 2025-8-20
Front Aging Neurosci. 2025-8-20
JMIR AI. 2025-9-2
Imaging Neurosci (Camb). 2024-7-12
本文引用的文献
Proc AAAI Conf Artif Intell. 2018-2
J Am Stat Assoc. 2015
BMC Bioinformatics. 2016-2-3
Annu Rev Clin Psychol. 2015-12-11
J Mach Learn Res. 2013-2