Silverwood Richard J, Calderwood Lisa, Henderson Morag, Sakshaug Joseph W, Ploubidis George B
University College London, UK.
University of Warwick, UK.
Longit Life Course Stud. 2024 Feb 15;15(2):227-250. doi: 10.1332/17579597Y2024D000000010.
Non-response is common in longitudinal surveys, reducing efficiency and introducing the potential for bias. Principled methods, such as multiple imputation, are generally required to obtain unbiased estimates in surveys subject to missingness which is not completely at random. The inclusion of predictors of non-response in such methods, for example as auxiliary variables in multiple imputation, can help improve the plausibility of the missing at random assumption underlying these methods and hence reduce bias. We present a systematic data-driven approach used to identify predictors of non-response at Wave 8 (age 25-26) of Next Steps, a UK national cohort study that follows a sample of 15,770 young people from age 13-14 years. The identified predictors of non-response were across a number of broad categories, including personal characteristics, schooling and behaviour in school, activities and behaviour outside of school, mental health and well-being, socio-economic status, and practicalities around contact and survey completion. We found that including these predictors of non-response as auxiliary variables in multiple imputation analyses allowed us to restore sample representativeness in several different settings, though we acknowledge that this is unlikely to universally be the case. We propose that these variables are considered for inclusion in future analyses using principled methods to explore and attempt to reduce bias due to non-response in Next Steps. Our data-driven approach to this issue could also be used as a model for investigations in other longitudinal studies.
无应答现象在纵向调查中很常见,这会降低效率并引入偏差风险。在存在并非完全随机缺失情况的调查中,通常需要采用诸如多重填补等有原则的方法来获得无偏估计。在此类方法中纳入无应答预测因素,例如作为多重填补中的辅助变量,可以有助于提高这些方法所基于的随机缺失假设的合理性,从而减少偏差。我们提出了一种系统的数据驱动方法,用于识别“下一步”研究第8轮(25 - 26岁)无应答的预测因素。“下一步”是一项英国全国队列研究,跟踪了15770名13 - 14岁年轻人样本。所识别出的无应答预测因素涵盖多个广泛类别,包括个人特征、学校教育及在校行为、校外活动及行为、心理健康与幸福感、社会经济地位以及与联系和调查完成相关的实际情况。我们发现,在多重填补分析中将这些无应答预测因素作为辅助变量纳入,使我们能够在几种不同情况下恢复样本代表性,尽管我们承认并非在所有情况下都能如此。我们建议在未来使用有原则的方法进行分析时考虑纳入这些变量,以探究并尝试减少“下一步”研究中因无应答导致的偏差。我们针对此问题的数据驱动方法也可作为其他纵向研究调查的模型。