1 Department of Mathematics, Northumbria University, Newcastle upon Tyne, UK.
2 Centre for Statistics in Medicine, University of Oxford, Oxford, UK.
Stat Methods Med Res. 2019 Jan;28(1):102-116. doi: 10.1177/0962280217715663. Epub 2017 Jul 5.
Sample selection arises when the outcome of interest is partially observed in a study. Although sophisticated statistical methods in the parametric and non-parametric framework have been proposed to solve this problem, it is yet unclear how to deal with selectively missing covariate data using simple multiple imputation techniques, especially in the absence of exclusion restrictions and deviation from normality. Motivated by the 2003-2004 NHANES data, where previous authors have studied the effect of socio-economic status on blood pressure with missing data on income variable, we proposed the use of a robust imputation technique based on the selection-t sample selection model. The imputation method, which is developed within the frequentist framework, is compared with competing alternatives in a simulation study. The results indicate that the robust alternative is not susceptible to the absence of exclusion restrictions - a property inherited from the parent selection-t model - and performs better than models based on the normal assumption even when the data is generated from the normal distribution. Applications to missing outcome and covariate data further corroborate the robustness properties of the proposed method. We implemented the proposed approach within the MICE environment in R Statistical Software.
在研究中,当感兴趣的结果部分观察到时,就会出现样本选择问题。尽管在参数和非参数框架中已经提出了复杂的统计方法来解决这个问题,但如何使用简单的多重插补技术处理选择性缺失协变量数据,特别是在没有排除限制和偏离正态性的情况下,仍然不清楚。受 2003-2004 年 NHANES 数据的启发,先前的作者已经研究了社会经济地位对血压的影响,数据中存在收入变量的缺失,我们提出使用基于选择 t 样本选择模型的稳健插补技术。该插补方法是在频率主义框架内开发的,并在模拟研究中与竞争替代方法进行了比较。结果表明,稳健替代方法不受排除限制的影响——这是从父选择 t 模型继承的属性——即使数据是从正态分布生成的,它也比基于正态假设的模型表现更好。对缺失结果和协变量数据的应用进一步证实了所提出方法的稳健性。我们在 R 统计软件的 MICE 环境中实现了所提出的方法。