Yank Veronica, Agarwal Sanjhavi, Loftus Pooja, Asch Steven, Rehkopf David
Veronica Yank and Sanjhavi Agarwal are with the Division of General Internal Medicine, University of California, San Francisco. Pooja Loftus is with the Division of General Medical Disciplines, Stanford University, Stanford, CA. Steven Asch is with the VA Palo Alto Health Care System, Palo Alto, CA, and the Division of General Medical Disciplines, Stanford University. David Rehkopf is with the Division of General Medical Disciplines, Stanford University.
Am J Public Health. 2017 Aug;107(8):1283-1289. doi: 10.2105/AJPH.2017.303824. Epub 2017 Jun 22.
To determine the generalizability of crowdsourced, electronic health data from self-selected individuals using a national survey as a reference.
Using the world's largest crowdsourcing platform in 2015, we collected data on characteristics known to influence cardiovascular disease risk and identified comparable data from the 2013 Behavioral Risk Factor Surveillance System. We used age-stratified logistic regression models to identify differences among groups.
Crowdsourced respondents were younger, more likely to be non-Hispanic and White, and had higher educational attainment. Those aged 40 to 59 years were similar to US adults in the rates of smoking, diabetes, hypertension, and hyperlipidemia. Those aged 18 to 39 years were less similar, whereas those aged 60 to 75 years were underrepresented among crowdsourced respondents.
Crowdsourced health data might be most generalizable to adults aged 40 to 59 years, but studies of younger or older populations, racial and ethnic minorities, or those with lower educational attainment should approach crowdsourced data with caution. Public Health Implications. Policymakers, the national Precision Medicine Initiative, and others planning to use crowdsourced data should take explicit steps to define and address anticipated underrepresentation by important population subgroups.
以一项全国性调查为参照,确定来自自我选择个体的众包电子健康数据的可推广性。
2015年,我们利用全球最大的众包平台,收集了已知会影响心血管疾病风险的特征数据,并从2013年行为风险因素监测系统中识别出可比数据。我们使用年龄分层逻辑回归模型来识别组间差异。
众包调查对象更年轻,更有可能是非西班牙裔白人,且受教育程度更高。40至59岁人群在吸烟、糖尿病、高血压和高脂血症发生率方面与美国成年人相似。18至39岁人群的相似性较低,而60至75岁人群在众包调查对象中占比不足。
众包健康数据可能对40至59岁的成年人最具可推广性,但对更年轻或更年长人群、少数种族和族裔群体或受教育程度较低人群的研究,在使用众包数据时应谨慎。对公共卫生的影响。政策制定者、国家精准医学计划以及其他计划使用众包数据的机构,应采取明确措施,界定并解决重要人群亚组预期代表性不足的问题。