Geraci Marco
Centre for Paediatric Epidemiology and Biostatistics, Institute of Child Health, University College London, UK
Stat Methods Med Res. 2016 Aug;25(4):1393-421. doi: 10.1177/0962280213484401. Epub 2013 Apr 23.
The estimation of population parameters using complex survey data requires careful statistical modelling to account for the design features. This is further complicated by unit and item nonresponse for which a number of methods have been developed in order to reduce estimation bias. In this paper, we address some issues that arise when the target of the inference (i.e. the analysis model or model of interest) is the conditional quantile of a continuous outcome. Survey design variables are duly included in the analysis and a bootstrap variance estimation approach is proposed. Missing data are multiply imputed by means of chained equations. In particular, imputation of continuous variables is based on their empirical distribution, conditional on all other variables in the analysis. This method preserves the distributional relationships in the data, including conditional skewness and kurtosis, and successfully handles bounded outcomes. Our motivating study concerns the analysis of birthweight determinants in a large UK-based cohort of children. A novel finding on the parental conflict theory is reported. R code implementing these procedures is provided.
使用复杂调查数据估计总体参数需要仔细的统计建模,以考虑设计特征。单位无应答和项目无应答使情况更加复杂,为此已开发了多种方法以减少估计偏差。在本文中,我们讨论了当推断目标(即分析模型或感兴趣的模型)是连续结果的条件分位数时出现的一些问题。调查设计变量被适当地纳入分析,并提出了一种自助方差估计方法。缺失数据通过链式方程进行多次插补。特别是,连续变量的插补基于其经验分布,并以分析中的所有其他变量为条件。该方法保留了数据中的分布关系,包括条件偏度和峰度,并成功处理了有界结果。我们的激励性研究涉及对英国一个大型儿童队列中出生体重决定因素的分析。报告了关于父母冲突理论的一项新发现。提供了实现这些程序的R代码。