评估2024年亚马逊土耳其机器人（MTurk）数据的质量和可靠性。

Assessing the quality and reliability of the Amazon Mechanical Turk (MTurk) data in 2024.

作者信息

Shimoni Hagar, Axelrod Vadim

机构信息

The Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat Gan, Israel.

出版信息

R Soc Open Sci. 2025 Jul 16;12(7):250361. doi: 10.1098/rsos.250361. eCollection 2025 Jul.

DOI:10.1098/rsos.250361

PMID:40727403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12303094/

Abstract

Amazon Mechanical Turk (MTurk) has been one of the most popular platforms for online research in psychology and the social sciences in general. While concerns about MTurk data quality have been raised, the platform continues to be widely used. The question is whether the MTurk platform is suitable for research and, if so, whether it is used optimally. We conducted a systematic investigation of MTurk data quality and reliability, including main and replication experiments, with more than 1300 participants subdivided into three cohorts: (i) workers (i.e. participants on the MTurk platform) with master requirement (i.e. high-performing workers selected by MTurk), (ii) workers without master requirement, and (iii) workers without master requirement, but with a 95% or above approval rate. We found that master workers almost never missed attentional checks, exhibited high reliability and showed no tendency towards straightlining, therefore, these workers are recommended, especially when the naivety of participants is not a strong prerequisite and no large sample size is required. In contrast, the workers without restrictions or with a 95% or above approval-rate threshold missed many attentional checks, exhibited low reliability and showed a tendency towards straightlining, raising serious concerns about the suitability of these workers for research.

摘要

亚马逊土耳其机器人（MTurk）一直是心理学及社会科学领域最受欢迎的在线研究平台之一。尽管有人对MTurk的数据质量提出了担忧，但该平台仍被广泛使用。问题在于MTurk平台是否适合用于研究，如果适合，其使用是否达到了最佳状态。我们对MTurk的数据质量和可靠性进行了系统调查，包括主要实验和重复实验，超过1300名参与者被分为三个群组：（i）有硕士要求的工人（即MTurk挑选出的高绩效工人），（ii）无硕士要求的工人，以及（iii）无硕士要求但批准率达到95%或以上的工人。我们发现，有硕士要求的工人几乎从未错过注意力检查，表现出高可靠性且没有直线答题倾向，因此，推荐使用这些工人，尤其是当参与者的天真不是一个强烈前提且不需要大样本量时。相比之下，没有限制或批准率阈值达到95%或以上的工人错过了许多注意力检查，表现出低可靠性且有直线答题倾向，这引发了对这些工人是否适合用于研究的严重担忧。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db44/12303094/9eb2cf62d756/rsos.250361.f001.jpg

相似文献

Assessing the quality and reliability of the Amazon Mechanical Turk (MTurk) data in 2024.评估2024年亚马逊土耳其机器人（MTurk）数据的质量和可靠性。

R Soc Open Sci. 2025 Jul 16;12(7):250361. doi: 10.1098/rsos.250361. eCollection 2025 Jul.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物：网状Meta分析

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂（GLP-1 RAs）减肥效果的网状Meta分析的数量、质量及结果：一项范围综述

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备：证据综合和成本效益分析。

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Inhaled steroids and risk of pneumonia for chronic obstructive pulmonary disease.吸入性糖皮质激素与慢性阻塞性肺疾病患者的肺炎风险

Cochrane Database Syst Rev. 2014 Mar 10;2014(3):CD010115. doi: 10.1002/14651858.CD010115.pub2.

Sexual Harassment and Prevention Training性骚扰与预防培训

Surveillance of Barrett's oesophagus: exploring the uncertainty through systematic review, expert workshop and economic modelling.巴雷特食管的监测：通过系统评价、专家研讨会和经济模型探索不确定性

Health Technol Assess. 2006 Mar;10(8):1-142, iii-iv. doi: 10.3310/hta10080.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

本文引用的文献

The Burden for High-Quality Online Data Collection Lies With Researchers, Not Recruitment Platforms.高质量在线数据收集的负担在于研究人员，而不是招聘平台。

Perspect Psychol Sci. 2024 Nov;19(6):891-899. doi: 10.1177/17456916241242734. Epub 2024 Apr 22.

Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA.在线人体研究中的数据质量：MTurk、ProLific、CloudResearch、Qualtrics 和 SONA 之间的比较。

PLoS One. 2023 Mar 14;18(3):e0279720. doi: 10.1371/journal.pone.0279720. eCollection 2023.

Too Good to Be True: Bots and Bad Data From Mechanical Turk.好得难以置信：来自 Mechanical Turk 的机器人和不良数据。

Perspect Psychol Sci. 2024 Nov;19(6):887-890. doi: 10.1177/17456916221120027. Epub 2022 Nov 7.

Evaluating CloudResearch's Approved Group as a solution for problematic data quality on MTurk.评估 CloudResearch 的已批准小组作为解决 MTurk 上数据质量问题的方案。

Behav Res Methods. 2023 Dec;55(8):3953-3964. doi: 10.3758/s13428-022-01999-x. Epub 2022 Nov 3.

Nordic adolescents responding to demanding survey scales in boring contexts: Examining straightlining.北欧青少年在枯燥情境下对要求苛刻的调查量表的反应：直选现象研究。

J Adolesc. 2022 Aug;94(6):829-843. doi: 10.1002/jad.12066. Epub 2022 Jun 19.

Data quality of platforms and panels for online behavioral research.在线行为研究的平台和面板的数据质量。

Behav Res Methods. 2022 Aug;54(4):1643-1662. doi: 10.3758/s13428-021-01694-3. Epub 2021 Sep 29.

Concerns and recommendations for using Amazon MTurk for eating disorder research.使用亚马逊 Mechanical Turk（MTurk）进行饮食失调研究的关注点和建议。

Int J Eat Disord. 2022 Feb;55(2):263-272. doi: 10.1002/eat.23614. Epub 2021 Sep 25.

Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool.已枯竭还是尚未充分开发？关于如何利用 Mechanical Turk 参与者群体这一巨大但尚未充分开发的潜力的建议。

PLoS One. 2019 Dec 16;14(12):e0226394. doi: 10.1371/journal.pone.0226394. eCollection 2019.

Clarifying the Effect of Test Speededness.阐明测试速度的影响。

Appl Psychol Meas. 2019 Nov;43(8):611-623. doi: 10.1177/0146621618817783. Epub 2018 Dec 19.

Online panels in social science research: Expanding sampling methods beyond Mechanical Turk.在线panel 调查在社会科学研究中的运用：超越 Mechanical Turk 的抽样方法。

Behav Res Methods. 2019 Oct;51(5):2022-2038. doi: 10.3758/s13428-019-01273-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估2024年亚马逊土耳其机器人（MTurk）数据的质量和可靠性。

Assessing the quality and reliability of the Amazon Mechanical Turk (MTurk) data in 2024.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献