Department of Health Systems Administration, School of Nursing & Health Studies, Georgetown University, Washington, DC, United States.
Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
Prev Med. 2017 Sep;102:93-99. doi: 10.1016/j.ypmed.2017.07.006. Epub 2017 Jul 8.
Internet-based crowdsourcing is increasingly used for social and behavioral research in public health, however the potential generalizability of crowdsourced data remains unclear. This study assessed the population representativeness of Internet-based crowdsourced data.
A total of 3999 U.S. young adults ages 18 to 30years were recruited in 2016 through Internet-based crowdsourcing to complete measures taken from the 2012-2013 National Adult Tobacco Survey (NATS). Post-hoc sampling weights were created using procedures similar to the NATS. Weighted analyses were conducted in 2016 to compare crowdsourced and publicly-available 2012-2013 NATS data on demographics, tobacco use, and measures of tobacco perceptions and product warning label exposure.
Those in the crowdsourced sample were less likely to report an annual household income of $50,000 or greater, and e-cigarette, waterpipe, and cigar use were more prevalent in the crowdsourced sample. High proportions of both samples indicated cigarette smoking is very harmful and very addictive. Comparable proportions of non-smokers and smokers reported cigarette warning label exposure, however the likelihood of reporting that smoking is very harmful by frequency of warning label exposure was lower among smokers in the crowdsourced sample.
Our findings indicate that crowdsourced samples may differ demographically and may not produce generalizable estimates of tobacco use prevalence relative to population data after post-hoc sample weighting. However, correlational analyses in crowdsourced samples may reasonably approximate population data. Future studies can build from this work by testing additional methodological strategies to improve crowdsourced sampling strategies.
互联网众包越来越多地用于公共卫生领域的社会和行为研究,但众包数据的潜在普遍性仍不清楚。本研究评估了基于互联网的众包数据的人口代表性。
2016 年,通过互联网众包共招募了 3999 名年龄在 18 至 30 岁的美国年轻人,以完成 2012-2013 年全国成人烟草调查(NATS)中的各项测量。使用类似于 NATS 的程序创建了后验抽样权重。2016 年进行了加权分析,以比较众包和公开的 2012-2013 年 NATS 数据在人口统计学、烟草使用以及烟草认知和产品警告标签暴露方面的差异。
众包样本中报告年收入 50,000 美元或以上的比例较低,电子烟、水烟和雪茄的使用比例在众包样本中较高。两个样本中都有很大比例的人认为吸烟非常有害且非常容易上瘾。非吸烟者和吸烟者报告接触香烟警告标签的比例相当,但在众包样本中,吸烟者报告吸烟非常有害的可能性随着警告标签接触频率的增加而降低。
我们的研究结果表明,众包样本在人口统计学上可能存在差异,并且在进行后验样本加权后,可能无法对烟草使用流行率做出具有普遍性的估计。然而,众包样本中的相关分析可能合理地接近人群数据。未来的研究可以在此基础上进行,测试其他改进众包抽样策略的方法策略。