Cammarota Camillo, Pinto Alessandro
Department of Mathematics, "Sapienza" University of Rome, Rome, Italy.
Department of Experimental Medicine, Research Unit on "Food Science and Human Nutrition", "Sapienza" University of Rome, Rome, Italy.
J Appl Stat. 2020 May 13;48(9):1644-1658. doi: 10.1080/02664763.2020.1763930. eCollection 2021.
In prediction problems both response and covariates may have high correlation with a second group of influential regressors, that can be considered as background variables. An important challenge is to perform variable selection and importance assessment among the covariates in the presence of these variables. A clinical example is the prediction of the lean body mass (response) from bioimpedance (covariates), where anthropometric measures play the role of background variables. We introduce a reduced dataset in which the variables are defined as the residuals with respect to the background, and perform variable selection and importance assessment both in linear and random forest models. Using a clinical dataset of multi-frequency bioimpedance, we show the effectiveness of this method to select the most relevant predictors of the lean body mass beyond anthropometry.
在预测问题中,响应变量和协变量都可能与另一组有影响力的回归变量高度相关,这些回归变量可被视为背景变量。一个重要的挑战是在存在这些变量的情况下,对协变量进行变量选择和重要性评估。一个临床实例是根据生物电阻抗(协变量)预测去脂体重(响应变量),其中人体测量指标起到背景变量的作用。我们引入一个简化数据集,其中变量被定义为相对于背景的残差,并在线性模型和随机森林模型中进行变量选择和重要性评估。使用多频生物电阻抗的临床数据集,我们展示了该方法在选择超出人体测量学范畴的最相关去脂体重预测因子方面的有效性。