Marino Miguel, Buxton Orfeu M, Li Yi
Department of Family Medicine, Department of Public Health, Division of Biostatistics, Oregon Health and Science University, Portland, OR 97239 USA.
Associate Professor, Department of Biobehavioral Health, Pennsylvania State University, University Park, PA 16802. Lecturer on Medicine, Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115. Associate Neuroscientist, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115. Adjunct Associate Professor, Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
Stat (Int Stat Inst). 2017;6(1):31-46. doi: 10.1002/sta4.133. Epub 2017 Jan 8.
Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population.
缺失的协变量数据会妨碍多级回归设置中的变量选择。当前用于多重填补数据的变量选择技术通常通过存在问题的逐行删除和逐步选择方法来处理预测变量中的缺失值。此外,大多数变量选择方法是为独立线性回归模型开发的,不适用于具有不完整协变量数据的多级混合效应回归模型。我们开发了一种新颖的方法,当存在缺失数据时,该方法能够对多级随机效应模型的多重填补数据进行协变量选择。具体而言,我们建议将多重填补程序中的多重填补数据集堆叠起来,并通过组套索正则化应用组变量选择程序,以评估每个预测变量对整个填补数据集结果的总体影响。模拟结果证实了所提出的方法与竞争方法相比具有优势性能。我们应用该方法重新分析了健康方向 - 小企业癌症预防研究,该研究评估了一项针对工人阶级多民族人群中多种与风险相关行为的行为干预计划。