Zhang Nanhua, Little Roderick J
Department of Epidemiology & Biostatistics, College of Public Health, University of South Florida, Tampa, Florida 33612-3085, USA.
Biometrics. 2012 Sep;68(3):933-42. doi: 10.1111/j.1541-0420.2011.01718.x. Epub 2011 Dec 7.
We consider the linear regression of outcome Y on regressors W and Z with some values of W missing, when our main interest is the effect of Z on Y, controlling for W. Three common approaches to regression with missing covariates are (i) complete-case analysis (CC), which discards the incomplete cases, and (ii) ignorable likelihood methods, which base inference on the likelihood based on the observed data, assuming the missing data are missing at random (Rubin, 1976b), and (iii) nonignorable modeling, which posits a joint distribution of the variables and missing data indicators. Another simple practical approach that has not received much theoretical attention is to drop the regressor variables containing missing values from the regression modeling (DV, for drop variables). DV does not lead to bias when either (i) the regression coefficient of W is zero or (ii) W and Z are uncorrelated. We propose a pseudo-Bayesian approach for regression with missing covariates that compromises between the CC and DV estimates, exploiting information in the incomplete cases when the data support DV assumptions. We illustrate favorable properties of the method by simulation, and apply the proposed method to a liver cancer study. Extension of the method to more than one missing covariate is also discussed.
当我们主要关注的是在控制W的情况下Z对Y的影响,且W的某些值缺失时,我们考虑结果Y关于回归变量W和Z的线性回归。处理协变量缺失的回归问题有三种常见方法:(i)完全病例分析(CC),即丢弃不完全病例;(ii)可忽略似然方法,该方法基于观察到的数据的似然性进行推断,假设缺失数据是随机缺失的(鲁宾,1976b);(iii)不可忽略建模,即设定变量和缺失数据指标的联合分布。另一种未受到太多理论关注的简单实用方法是在回归建模中剔除包含缺失值的回归变量(DV,即剔除变量)。当(i)W的回归系数为零或(ii)W和Z不相关时,DV不会导致偏差。我们提出一种用于协变量缺失回归的伪贝叶斯方法,该方法在CC估计和DV估计之间进行折衷,当数据支持DV假设时利用不完全病例中的信息。我们通过模拟说明了该方法的良好性质,并将所提出的方法应用于一项肝癌研究。还讨论了将该方法扩展到多个缺失协变量的情况。