1 National Institute for Health and Welfare, Department of Public Health Solutions, Finland.
2 Statistics Finland, Finland.
Scand J Public Health. 2019 Jun;47(4):469-473. doi: 10.1177/1403494819840895. Epub 2019 Apr 11.
We aim to compare four different weighting methods to adjust for non-response in a survey on drinking habits and to examine whether the problem of under-coverage of survey estimates of alcohol use could be remedied by these methods in comparison to sales statistics.
The data from a general population survey of Finns aged 15-79 years in 2016 ( n=2285, response rate 60%) were used. Outcome measures were the annual volume of drinking and prevalence of hazardous drinking. A wide range of sociodemographic and regional variables from registers were available to model the non-response. Response propensities were modelled using logistic regression and random forest models to derive two sets of refined weights in addition to design weights and basic post-stratification weights.
Estimated annual consumption changed from 2.43 litres of 100% alcohol using design weights to 2.36-2.44 when using the other three weights and the estimated prevalence of hazardous drinkers changed from 11.4% to 11.4-11.8%, correspondingly. The use of weights derived by the random forest method generally provided smaller estimates than use of the logistic regression-based weights.
The use of complex non-response weights derived from the logistic regression model or random forest are not likely to provide much added value over more simple weights in surveys on alcohol use. Surveys may not catch heavy drinkers and therefore are prone for under-reporting of alcohol use at the population level. Also, factors other than sociodemographic characteristics are likely to influence participation decisions.
我们旨在比较四种不同的加权方法,以调整 2016 年一项关于饮酒习惯的调查中的无应答问题,并检验这些方法是否可以通过与销售统计数据相比来纠正酒精使用调查估计值的覆盖范围不足的问题。
使用了 2016 年芬兰年龄在 15-79 岁的一般人群调查的数据(n=2285,应答率为 60%)。结果测量为饮酒的年量和危险饮酒的流行率。从登记处获得了广泛的社会人口统计学和地区变量,以对无应答进行建模。使用逻辑回归和随机森林模型对应答倾向进行建模,除了设计权重和基本后分层权重外,还得出了两套细化权重。
使用设计权重估计的年消费量从 2.43 升 100%酒精变化为使用其他三种权重时的 2.36-2.44,估计的危险饮酒者流行率从 11.4%变化为 11.4-11.8%,相应地。随机森林方法得出的权重的使用通常提供的估计值小于基于逻辑回归的权重的使用。
在关于酒精使用的调查中,使用从逻辑回归模型或随机森林得出的复杂无应答权重不太可能比更简单的权重提供更多的附加值。调查可能无法捕捉到重度饮酒者,因此在人群水平上容易出现酒精使用报告不足的情况。此外,除了社会人口统计学特征外,其他因素也可能影响参与决策。