Division of General Internal Medicine and Health Services Research, Department of Medicine, University of California, Los Angeles, CA, United States.
Behavioral and Policy Sciences, RAND Corporation, Santa Monica, CA, United States.
J Med Internet Res. 2023 Aug 4;25:e46421. doi: 10.2196/46421.
Researchers have implemented multiple approaches to increase data quality from existing web-based panels such as Amazon's Mechanical Turk (MTurk).
This study extends prior work by examining improvements in data quality and effects on mean estimates of health status by excluding respondents who endorse 1 or both of 2 fake health conditions ("Syndomitis" and "Chekalism").
Survey data were collected in 2021 at baseline and 3 months later from MTurk study participants, aged 18 years or older, with an internet protocol address in the United States, and who had completed a minimum of 500 previous MTurk "human intelligence tasks." We included questions about demographic characteristics, health conditions (including the 2 fake conditions), and the Patient Reported Outcomes Measurement Information System (PROMIS)-29+2 (version 2.1) preference-based score survey. The 3-month follow-up survey was only administered to those who reported having back pain and did not endorse a fake condition at baseline.
In total, 15% (996/6832) of the sample endorsed at least 1 of the 2 fake conditions at baseline. Those who endorsed a fake condition at baseline were more likely to identify as male, non-White, younger, report more health conditions, and take longer to complete the survey than those who did not endorse a fake condition. They also had substantially lower internal consistency reliability on the PROMIS-29+2 scales than those who did not endorse a fake condition: physical function (0.69 vs 0.89), pain interference (0.80 vs 0.94), fatigue (0.80 vs 0.92), depression (0.78 vs 0.92), anxiety (0.78 vs 0.90), sleep disturbance (-0.27 vs 0.84), ability to participate in social roles and activities (0.77 vs 0.92), and cognitive function (0.65 vs 0.77). The lack of reliability of the sleep disturbance scale for those endorsing a fake condition was because it includes both positively and negatively worded items. Those who reported a fake condition reported significantly worse self-reported health scores (except for sleep disturbance) than those who did not endorse a fake condition. Excluding those who endorsed a fake condition improved the overall mean PROMIS-29+2 (version 2.1) T-scores by 1-2 points and the PROMIS preference-based score by 0.04. Although they did not endorse a fake condition at baseline, 6% (n=59) of them endorsed at least 1 of them on the 3-month survey and they had lower PROMIS-29+2 score internal consistency reliability and worse mean scores on the 3-month survey than those who did not report having a fake condition. Based on these results, we estimate that 25% (1708/6832) of the MTurk respondents provided careless or dishonest responses.
This study provides evidence that asking about fake health conditions can help to screen out respondents who may be dishonest or careless. We recommend this approach be used routinely in samples of members of MTurk.
研究人员已经实施了多种方法来提高来自现有基于网络的面板(如亚马逊的 Mechanical Turk (MTurk))的数据质量。
本研究通过检查排除了对 2 种虚假健康状况(“Syndomitis”和“Chekalism”)中的 1 种或 2 种均表示认可的受访者后,数据质量的改进和对健康状况平均估计的影响,扩展了之前的工作。
在 2021 年,通过 MTurk 研究参与者的基线和 3 个月后的调查数据进行研究,年龄在 18 岁或以上,在美国拥有互联网协议地址,并且完成了至少 500 项之前的 MTurk“人类智能任务”。我们询问了人口统计学特征、健康状况(包括 2 种虚假状况)以及患者报告的结果测量信息系统(PROMIS)-29+2(版本 2.1)偏好得分调查。只有那些在基线时报告有背痛且不认可虚假状况的人会收到 3 个月的随访调查。
总共有 15%(996/6832)的样本在基线时认可了至少 1 种虚假状况。那些在基线时认可虚假状况的人更可能是男性、非白人、更年轻、报告更多的健康状况、并且比不认可虚假状况的人完成调查所需的时间更长。他们在 PROMIS-29+2 量表上的内部一致性可靠性也明显低于不认可虚假状况的人:身体功能(0.69 与 0.89)、疼痛干扰(0.80 与 0.94)、疲劳(0.80 与 0.92)、抑郁(0.78 与 0.92)、焦虑(0.78 与 0.90)、睡眠障碍(-0.27 与 0.84)、参与社会角色和活动的能力(0.77 与 0.92)以及认知功能(0.65 与 0.77)。认可虚假状况的人睡眠障碍量表缺乏可靠性,是因为它包括了正向和负向措辞的项目。那些报告虚假状况的人自我报告的健康状况明显比不认可虚假状况的人差(除了睡眠障碍)。排除认可虚假状况的人将总体平均 PROMIS-29+2(版本 2.1)T 分数提高了 1-2 分,将 PROMIS 偏好得分提高了 0.04。尽管他们在基线时没有认可虚假状况,但 6%(n=59)的人在 3 个月的调查中认可了至少 1 种虚假状况,他们在 PROMIS-29+2 量表上的内部一致性可靠性较低,3 个月调查时的平均得分也比那些没有报告虚假状况的人差。根据这些结果,我们估计 25%(1708/6832)的 MTurk 受访者提供了粗心或不诚实的回答。
本研究提供了证据表明,询问虚假健康状况可以帮助筛选出可能不诚实或粗心的受访者。我们建议在 MTurk 成员的样本中定期使用这种方法。