University College London, London, UK.
University College London, London, UK.
J Clin Epidemiol. 2021 Aug;136:44-54. doi: 10.1016/j.jclinepi.2021.02.019. Epub 2021 Feb 27.
Non-response is unavoidable in longitudinal surveys. The consequences are lower statistical power and the potential for bias. We implemented a systematic data-driven approach to identify predictors of non-response in the National Child Development Study (NCDS; 1958 British birth cohort). Such variables can help make the missing at random assumption more plausible, which has implications for the handling of missing data STUDY DESIGN AND SETTING: We identified predictors of non-response using data from the 11 sweeps (birth to age 55) of the NCDS (n = 17,415), employing parametric regressions and the LASSO for variable selection.
Disadvantaged socio-economic background in childhood, worse mental health and lower cognitive ability in early life, and lack of civic and social participation in adulthood were consistently associated with non-response. Using this information, along with other data from NCDS, we were able to replicate the "population distribution" of educational attainment and marital status (derived from external data), and the original distributions of key early life characteristics.
The identified predictors of non-response have the potential to improve the plausibility of the missing at random assumption. They can be straightforwardly used as "auxiliary variables" in analyses with principled methods to reduce bias due to missing data.
纵向调查中不可避免会出现无回应现象。其后果是统计效力降低和潜在的偏差。我们实施了一种系统的数据驱动方法,以确定全国儿童发展研究(NCDS;1958 年英国出生队列)中无回应的预测因素。这些变量有助于使随机缺失假设更合理,这对缺失数据的处理有影响。
我们使用 NCDS 的 11 次随访(从出生到 55 岁)的数据(n=17415),使用参数回归和 LASSO 进行变量选择,来确定无回应的预测因素。
童年时期社会经济地位较低、早期心理健康状况较差和认知能力较低、成年后缺乏公民和社会参与,这些因素与无回应一直相关。利用这些信息以及 NCDS 的其他数据,我们能够复制教育程度和婚姻状况的“人口分布”(源自外部数据),以及关键早期生活特征的原始分布。
确定的无回应预测因素有可能提高随机缺失假设的合理性。它们可以作为分析中简单的“辅助变量”,使用有原则的方法来减少因缺失数据引起的偏差。