DePalma Mary Turner, Rizzotti Michael C, Branneman Matthew
Ithaca College, Department of Psychology, Ithaca, NY, United States.
JMIR Diabetes. 2017 Jul 12;2(2):e11. doi: 10.2196/diabetes.7473.
To eliminate health disparities, research will depend on our ability to reach select groups of people (eg, samples of a particular racial or ethnic group with a particular disease); unfortunately, researchers often experience difficulty obtaining high-quality data from samples of sufficient size.
Past studies utilizing MTurk applaud its diversity, so our initial objective was to capitalize on MTurk's diversity to investigate psychosocial factors related to diabetes self-care.
In Study 1, a "Health Survey" was posted on MTurk to examine diabetes-relevant psychosocial factors. The survey was restricted to individuals who were 18 years of age or older with diabetes. Detection of irregularities in the data, however, prompted an evaluation of the quality of MTurk health-relevant data. This ultimately led to Study 2, which utilized an alert statement to improve conscientious behavior, or the likelihood that participants would be thorough and diligent in their responses. Trap questions were also embedded to assess conscientious behavior.
In Study 1, of 4165 responses, 1246 were generated from 533 unique IP addresses completing the survey multiple times within close temporal proximity. Ultimately, only 252 responses were found to be acceptable. Further analyses indicated additional quality concerns with this subsample. In Study 2, as compared with the MTurk sample (N=316), the undergraduate sample (N=300) included more females, and fewer individuals who were married. The samples did not differ with respect to race. Although the presence of an alert resulted in fewer trap failures (mean=0.07) than when no alert was present (mean=0.11), this difference failed to reach significance: F=2.5, P=.11, ƞ²=.004, power=.35. The modal trap failure response was zero, while the mean was 0.092 (SD=0.32). There were a total of 60 trap failures in a context where the potential could have exceeded 16,000.
Published studies that utilize MTurk participants are rapidly appearing in the health domain. While MTurk may have the potential to be more diverse than an undergraduate sample, our efforts did not meet the criteria for what would constitute a diverse sample in and of itself. Because some researchers have experienced successful data collection on MTurk, while others report disastrous results, Kees et al recently identified that one essential area of research is of the types and magnitude of cheating behavior occurring on Web-based platforms. The present studies can contribute to this dialogue, and alternately provide evidence of disaster and success. Moving forward, it is recommended that researchers employ best practices in survey design and deliberately embed trap questions to assess participant behavior. We would strongly suggest that standards be in place for publishing the results of Web-based surveys-standards that protect against publication unless there are suitable quality assurance tests built into the survey design, distribution, and analysis.
为消除健康差异,研究将依赖于我们接触特定人群(例如患有特定疾病的特定种族或族裔群体样本)的能力;不幸的是,研究人员在从足够规模的样本中获取高质量数据时常常遇到困难。
过去利用亚马逊土耳其机器人(MTurk)的研究称赞其多样性,因此我们最初的目标是利用MTurk的多样性来调查与糖尿病自我护理相关的心理社会因素。
在研究1中,在MTurk上发布了一项“健康调查”,以检查与糖尿病相关的心理社会因素。该调查仅限于18岁及以上的糖尿病患者。然而,对数据中异常情况的检测促使对MTurk与健康相关数据的质量进行评估。这最终导致了研究2,该研究使用了一条警示声明来改善尽责行为,即参与者在回答问题时认真和勤勉的可能性。还嵌入了陷阱问题来评估尽责行为。
在研究1中,在4165份回复中,有1246份来自533个唯一的IP地址,这些地址在相近的时间内多次完成调查。最终,仅发现252份回复是可接受的。进一步分析表明该子样本存在其他质量问题。在研究2中,与MTurk样本(N = 316)相比,本科生样本(N = 300)中的女性更多,已婚个体更少。两个样本在种族方面没有差异。尽管存在警示声明时陷阱问题答错的情况(均值 = 0.07)比没有警示声明时(均值 = 0.11)少,但这种差异未达到显著水平:F = 2.5,P = 0.11,ƞ² = 0.004,检验效能 = 0.35。陷阱问题答错情况的众数为零,而均值为0.092(标准差 = 0.32)。在可能超过16000次回答的情况下,总共出现了60次陷阱问题答错情况。
利用MTurk参与者的已发表研究在健康领域迅速涌现。虽然MTurk可能比本科生样本具有更多样化的潜力,但我们的研究结果并未达到其本身构成多样化样本的标准。由于一些研究人员在MTurk上成功收集了数据,而另一些人则报告了灾难性的结果,基斯等人最近指出,一个重要的研究领域是基于网络的平台上发生的作弊行为的类型和严重程度。本研究可以为这一讨论做出贡献,并提供失败和成功的证据。展望未来,建议研究人员在调查设计中采用最佳实践,并故意嵌入陷阱问题以评估参与者的行为。我们强烈建议制定基于网络的调查结果发表标准——这些标准应确保除非在调查设计、分发和分析中纳入了适当的质量保证测试,否则禁止发表。