Karvanen Juha, Tolonen Hanna, Härkänen Tommi, Jousilahti Pekka, Kuulasmaa Kari
Department of Mathematics and Statistics, University of Jyvaskyla, P.O. Box (MaD), FI-40014, Jyväskylä, Finland.
Department of Health, National Institute for Health and Welfare, P.O. Box 30, FI-00271, Helsinki, Finland.
J Clin Epidemiol. 2016 Aug;76:209-17. doi: 10.1016/j.jclinepi.2016.02.026. Epub 2016 Mar 9.
One of the main goals of health examination surveys is to provide unbiased estimates of health indicators at the population level. We demonstrate how multiple imputation methods may help to reduce the selection bias if partial data on some nonparticipants are collected.
In the FINRISK 2007 study, a population-based health study conducted in Finland, a random sample of 10,000 men and women aged 25-74 years were invited to participate. The study included a questionnaire data collection and a health examination. A total of 6,255 individuals participated in the study. Out of 3,745 nonparticipants, 473 returned a simplified questionnaire after a recontact. Both the participants and the nonparticipants were followed up for death and hospitalizations. The follow-up data allowed to check the assumptions on the missing data mechanism, and tailored multiple imputation methods were used to handle the missing data.
Nonparticipation is a strong predictor for mortality in the five-year follow-up. However, the recontact response does not predict mortality or morbidity among the nonparticipants when adjusted for age and sex. The result suggests that the recontact respondents can be used as proxy for all nonparticipants. A comparison of raw estimates and estimates adjusted for selection bias reveals clear differences in the estimated population prevalences of smoking and heavy alcohol usage.
All efforts to collect data on nonparticipants are likely to be useful even if the response rate for the recontact remains low. Statistical analysis of the recontact respondents provides an indication of the extent of the selection bias, even in studies where follow-up data are not available to check the assumptions.
健康检查调查的主要目标之一是在人群层面提供无偏倚的健康指标估计值。我们展示了如果收集了部分未参与者的部分数据,多重填补方法如何有助于减少选择偏倚。
在芬兰进行的一项基于人群的健康研究FINRISK 2007中,邀请了10000名年龄在25 - 74岁之间的男性和女性作为随机样本参与。该研究包括问卷调查数据收集和健康检查。共有6255人参与了研究。在3745名未参与者中,473人在再次联系后返回了一份简化问卷。对参与者和未参与者都进行了死亡和住院情况的随访。随访数据有助于检验关于缺失数据机制的假设,并使用定制的多重填补方法来处理缺失数据。
在五年随访中,未参与是死亡率的一个强预测因素。然而,在对年龄和性别进行调整后,再次联系的回复并不能预测未参与者的死亡率或发病率。结果表明,再次联系的受访者可以作为所有未参与者的替代。原始估计值与经选择偏倚调整后的估计值的比较显示,在吸烟和重度饮酒的估计人群患病率方面存在明显差异。
即使再次联系的回复率仍然很低,收集未参与者数据的所有努力可能都是有用的。对再次联系的受访者进行统计分析可以表明选择偏倚的程度,即使在没有随访数据来检验假设的研究中也是如此。