Luo Chuji, Daniels Michael J
Google LLC, Mountain View, California 94043,USA.
Department of Statistics, University of Florida, Gainesville, Florida 32611, USA.
Stat Sci. 2024 May;39(2):286-304. doi: 10.1214/23-sts900. Epub 2024 May 5.
Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, which is flexible enough to capture the interactions between predictors and nonlinear relationships with the response. An emphasis of this review is on the ability to identify relevant predictors. We also propose two variable importance measures which can be used in a permutation-based variable selection approach, and a backward variable selection procedure for BART. We introduce these variations as a way of illustrating limitations and opportunities for improving current approaches and assess these via simulations.
变量选择是一个重要的统计问题。当候选预测变量是混合类型(例如连续型和二元型)并且以非线性和/或非加性方式影响响应变量时,这个问题会变得更具挑战性。在本文中,我们回顾了用于贝叶斯加法回归树(BART)模型的现有变量选择方法,BART是一种非参数回归模型,它足够灵活,能够捕捉预测变量之间的相互作用以及与响应的非线性关系。本综述的重点是识别相关预测变量的能力。我们还提出了两种可用于基于排列的变量选择方法的变量重要性度量,以及一种针对BART的向后变量选择程序。我们引入这些变体作为说明改进当前方法的局限性和机会的一种方式,并通过模拟对其进行评估。