Barnett Adrian G, McElwee Paul, Nathan Andrea, Burton Nicola W, Turrell Gavin
School of Public Health and Social Work and Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove, Queensland, Australia.
Institute for Health and Ageing, Australian Catholic University, Melbourne, Victoria, Australia.
BMJ Open. 2017 Oct 30;7(10):e017284. doi: 10.1136/bmjopen-2017-017284.
To examine whether respondents to a survey of health and physical activity and potential determinants could be grouped according to the questions they missed, known as 'item missing'.
Observational study of longitudinal data.
Residents of Brisbane, Australia.
6901 people aged 40-65 years in 2007.
We used a latent class model with a mixture of multinomial distributions and chose the number of classes using the Bayesian information criterion. We used logistic regression to examine if participants' characteristics were associated with their modal latent class. We used logistic regression to examine whether the amount of item missing in a survey predicted wave missing in the following survey.
Four per cent of participants missed almost one-fifth of the questions, and this group missed more questions in the middle of the survey. Eighty-three per cent of participants completed almost every question, but had a relatively high missing probability for a question on sleep time, a question which had an inconsistent presentation compared with the rest of the survey. Participants who completed almost every question were generally younger and more educated. Participants who completed more questions were less likely to miss the next longitudinal wave.
Examining patterns in item missing data has improved our understanding of how missing data were generated and has informed future survey design to help reduce missing data.
调查健康与身体活动及潜在决定因素调查的受访者是否可根据他们未回答的问题(即“项目缺失”)进行分组。
对纵向数据的观察性研究。
澳大利亚布里斯班的居民。
2007年6901名年龄在40至65岁之间的人。
我们使用了一个具有多项分布混合的潜在类别模型,并使用贝叶斯信息准则选择类别数量。我们使用逻辑回归来检验参与者的特征是否与其模态潜在类别相关。我们使用逻辑回归来检验一项调查中的项目缺失量是否能预测下一次调查中的波次缺失。
4%的参与者几乎未回答五分之一的问题,且该组在调查中期未回答的问题更多。83%的参与者几乎回答了每一个问题,但关于睡眠时间的问题缺失概率相对较高,该问题与调查的其他部分呈现方式不一致。几乎回答了每一个问题的参与者通常更年轻且受教育程度更高。回答问题更多的参与者错过下一次纵向波次的可能性更小。
检查项目缺失数据中的模式增进了我们对缺失数据产生方式的理解,并为未来的调查设计提供了信息,以帮助减少缺失数据。