评估 CloudResearch 的已批准小组作为解决 MTurk 上数据质量问题的方案。

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk.

机构信息

Department of Psychology, Queen's University, Kingston, ON, Canada.

CloudResearch, Queens, NY, USA.

出版信息

Behav Res Methods. 2023 Dec;55(8):3953-3964. doi: 10.3758/s13428-022-01999-x. Epub 2022 Nov 3.

DOI:10.3758/s13428-022-01999-x

PMID:36326997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10700412/

Abstract

Maintaining data quality on Amazon Mechanical Turk (MTurk) has always been a concern for researchers. These concerns have grown recently due to the bot crisis of 2018 and observations that past safeguards of data quality (e.g., approval ratings of 95%) no longer work. To address data quality concerns, CloudResearch, a third-party website that interfaces with MTurk, has assessed ~~165,000 MTurkers and categorized them into those that provide high- (~~100,000, Approved) and low- (~65,000, Blocked) quality data. Here, we examined the predictive validity of CloudResearch's vetting. In a pre-registered study, participants (N = 900) from the Approved and Blocked groups, along with a Standard MTurk sample (95% HIT acceptance ratio, 100+ completed HITs), completed an array of data-quality measures. Across several indices, Approved participants (i) identified the content of images more accurately, (ii) answered more reading comprehension questions correctly, (iii) responded to reversed coded items more consistently, (iv) passed a greater number of attention checks, (v) self-reported less cheating and actually left the survey window less often on easily Googleable questions, (vi) replicated classic psychology experimental effects more reliably, and (vii) answered AI-stumping questions more accurately than Blocked participants, who performed at chance on multiple outcomes. Data quality of the Standard sample was generally in between the Approved and Blocked groups. We discuss how MTurk's Approval Rating system is no longer an effective data-quality control, and we discuss the advantages afforded by using the Approved group for scientific studies on MTurk.

摘要

在 Amazon Mechanical Turk（MTurk）上维护数据质量一直是研究人员关注的问题。由于 2018 年的机器人危机以及过去的数据质量保障措施（例如 95%的批准率）不再有效的观察结果，这些担忧最近有所增加。为了解决数据质量问题，第三方网站 CloudResearch 与 MTurk 进行了接口，对大约 165,000 名 MTurker 进行了评估，并将他们分为提供高质量数据的（~~100,000 名，批准）和低质量数据的（~~65,000 名，阻止）。在这里，我们检查了 CloudResearch 审查的预测有效性。在一项预先注册的研究中，来自批准和阻止组的参与者（N = 900），以及标准的 MTurk 样本（95%的 HIT 接受率，完成 100+个 HIT），完成了一系列数据质量指标的测试。在几个指标中，批准组的参与者（i）更准确地识别图像的内容，（ii）更正确地回答阅读理解问题，（iii）更一致地回答反向编码项目，（iv）通过了更多的注意力检查，（v）自我报告的作弊行为较少，并且在容易谷歌搜索的问题上实际上较少离开调查窗口，（vi）更可靠地复制了经典心理学实验效应，以及（vii）比阻止组的参与者更准确地回答人工智能难题，阻止组的参与者在多个结果上表现出机会水平。标准样本的数据质量通常介于批准组和阻止组之间。我们讨论了 MTurk 的批准率系统不再是一种有效的数据质量控制，并且讨论了使用批准组进行 MTurk 上科学研究的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b1a/10700412/52ff29b7ec55/13428_2022_1999_Fig1_HTML.jpg

相似文献

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk.

Behav Res Methods. 2023 Dec;55(8):3953-3964. doi: 10.3758/s13428-022-01999-x. Epub 2022 Nov 3.

Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool.

PLoS One. 2019 Dec 16;14(12):e0226394. doi: 10.1371/journal.pone.0226394. eCollection 2019.

Comparing the Recruitment of Research Participants With Chronic Low Back Pain Using Amazon Mechanical Turk With the Recruitment of Patients From Chiropractic Clinics: A Quasi-Experimental Study.

J Manipulative Physiol Ther. 2021 Oct;44(8):601-611. doi: 10.1016/j.jmpt.2022.02.004. Epub 2022 Jun 18.

Using Crowdsourcing for Alcohol and Nicotine Use Research: Prevalence, Data Quality, and Attrition on Amazon Mechanical Turk.

Subst Use Misuse. 2022;57(6):857-866. doi: 10.1080/10826084.2022.2046096. Epub 2022 Mar 8.

Assessing Diabetes-Relevant Data Provided by Undergraduate and Crowdsourced Web-Based Survey Participants for Honesty and Accuracy.

JMIR Diabetes. 2017 Jul 12;2(2):e11. doi: 10.2196/diabetes.7473.

Concerns and recommendations for using Amazon MTurk for eating disorder research.

Int J Eat Disord. 2022 Feb;55(2):263-272. doi: 10.1002/eat.23614. Epub 2021 Sep 25.

Data quality of platforms and panels for online behavioral research.

Behav Res Methods. 2022 Aug;54(4):1643-1662. doi: 10.3758/s13428-021-01694-3. Epub 2021 Sep 29.

Mechanical Turk data collection in addiction research: utility, concerns and best practices.

Addiction. 2020 Oct;115(10):1960-1968. doi: 10.1111/add.15032. Epub 2020 Mar 24.

Is it ethical to use Mechanical Turk for behavioral research? Relevant data from a representative survey of MTurk participants and wages.

Behav Res Methods. 2023 Dec;55(8):4048-4067. doi: 10.3758/s13428-022-02005-0. Epub 2023 May 22.

Food for thought: Commentary on Burnette et al. (2021) "Concerns and recommendations for using Amazon MTurk for eating disorder research".

Int J Eat Disord. 2022 Feb;55(2):282-284. doi: 10.1002/eat.23671. Epub 2022 Jan 5.

引用本文的文献

Imposters, Bots, and Other Threats to Data Integrity in Online Research: Scoping Review of the Literature and Recommendations for Best Practices.

Online J Public Health Inform. 2025 Aug 29;17:e70926. doi: 10.2196/70926.

Development and Initial Validation of a Measure of Parental Racial/Ethnic Discrimination in Pediatric Healthcare.

J Racial Ethn Health Disparities. 2025 Aug 20. doi: 10.1007/s40615-025-02602-7.

Profitable third-party punishment destabilizes cooperation.

Proc Natl Acad Sci U S A. 2025 Aug 26;122(34):e2508479122. doi: 10.1073/pnas.2508479122. Epub 2025 Aug 19.

A Comprehensive Behavioral Dataset for the Abstraction and Reasoning Corpus.

Sci Data. 2025 Aug 7;12(1):1380. doi: 10.1038/s41597-025-05687-1.

Sustaining corrected beliefs in false news headlines over time: The roles of correction format and recognizing corrections.

Mem Cognit. 2025 Aug 6. doi: 10.3758/s13421-025-01760-7.

Assessing the quality and reliability of the Amazon Mechanical Turk (MTurk) data in 2024.

R Soc Open Sci. 2025 Jul 16;12(7):250361. doi: 10.1098/rsos.250361. eCollection 2025 Jul.

Perceptions of Multiple Perpetrator Rape in the Courtroom.

Behav Sci (Basel). 2025 Jun 23;15(7):844. doi: 10.3390/bs15070844.

Rewards and punishments help humans overcome biases against cooperation partners assumed to be machines.

iScience. 2025 Jun 6;28(7):112833. doi: 10.1016/j.isci.2025.112833. eCollection 2025 Jul 18.

Computational characterization of metacognitive ability in subjective decision-making.

bioRxiv. 2025 May 28:2025.05.23.655775. doi: 10.1101/2025.05.23.655775.

Jealousy in interracial and same-race relationships.

J Soc Pers Relat. 2025 May;42(5):1219-1240. doi: 10.1177/02654075251317425. Epub 2025 Feb 1.

本文引用的文献

Is it ethical to use Mechanical Turk for behavioral research? Relevant data from a representative survey of MTurk participants and wages.

Behav Res Methods. 2023 Dec;55(8):4048-4067. doi: 10.3758/s13428-022-02005-0. Epub 2023 May 22.

Logistic or linear? Estimating causal effects of experimental treatments on binary outcomes using regression analysis.

J Exp Psychol Gen. 2021 Apr;150(4):700-709. doi: 10.1037/xge0000920. Epub 2020 Sep 24.

Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool.

PLoS One. 2019 Dec 16;14(12):e0226394. doi: 10.1371/journal.pone.0226394. eCollection 2019.

An Evaluation of Amazon's Mechanical Turk, Its Rapid Rise, and Its Effective Use.

Perspect Psychol Sci. 2018 Mar;13(2):149-154. doi: 10.1177/1745691617706516.

Measuring the Prevalence of Problematic Respondent Behaviors among MTurk, Campus, and Community Participants.

PLoS One. 2016 Jun 28;11(6):e0157732. doi: 10.1371/journal.pone.0157732. eCollection 2016.

The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yet false) research conclusions.

J Pers Soc Psychol. 2016 Oct;111(4):493-504. doi: 10.1037/pspa0000056. Epub 2016 Jun 13.

TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences.

Behav Res Methods. 2017 Apr;49(2):433-442. doi: 10.3758/s13428-016-0727-z.

Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data?

Perspect Psychol Sci. 2011 Jan;6(1):3-5. doi: 10.1177/1745691610393980. Epub 2011 Feb 3.

Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants.

Behav Res Methods. 2016 Mar;48(1):400-7. doi: 10.3758/s13428-015-0578-z.

The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk.

Behav Res Methods. 2015 Jun;47(2):519-28. doi: 10.3758/s13428-014-0483-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估 CloudResearch 的已批准小组作为解决 MTurk 上数据质量问题的方案。

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk.

机构信息

Department of Psychology, Queen's University, Kingston, ON, Canada.

CloudResearch, Queens, NY, USA.

出版信息

Behav Res Methods. 2023 Dec;55(8):3953-3964. doi: 10.3758/s13428-022-01999-x. Epub 2022 Nov 3.

DOI:10.3758/s13428-022-01999-x

PMID:36326997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10700412/

Abstract

摘要

评估 CloudResearch 的已批准小组作为解决 MTurk 上数据质量问题的方案。

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估 CloudResearch 的已批准小组作为解决 MTurk 上数据质量问题的方案。

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献