Hardesty Jeffrey J, Crespi Elizabeth, Sinamo Joshua K, Nian Qinghua, Breland Alison, Eissenberg Thomas, Kennedy Ryan David, Cohen Joanna E
Institute for Global Tobacco Control, Department of Health, Behavior and Society, Johns Hopkins University, Baltimore, MD, United States.
Center for the Study of Tobacco Products, Department of Psychology, Virginia Commonwealth University, Richmond, VA, United States.
J Med Internet Res. 2024 Dec 16;26:e60184. doi: 10.2196/60184.
In 2019, we launched a web-based longitudinal survey of adults who frequently use e-cigarettes, called the Vaping and Patterns of E-cigarette Use Research (VAPER) Study. The initial attempt to collect survey data failed due to fraudulent survey submissions, likely submitted by survey bots and other survey takers. This paper chronicles the journey from that setback to the successful completion of 5 waves of data collection. The section "Naïve Beginnings" examines the study preparation phase, identifying the events, decisions, and assumptions that contributed to the failure (eg, allowing anonymous survey takers to submit surveys and overreliance on a third-party's proprietary fraud detection tool to identify participants attempting to submit multiple surveys). "A 5-Alarm Fire and Subsequent Investigation" summarizes the warning signs that suggested fraudulent survey submissions had compromised the data integrity after the initial survey launched (eg, an unanticipated acceleration in recruitment and a voicemail alleging fraudulent receipt of multiple gift codes). This section also covers the investigation process, along with conclusions regarding how the methodology was exploited (eg, clearing cookies and using virtual private networks) and the extent of the issue (ie, only 363/1624, 22.4% of the survey completions were likely valid). "Building More Resilient Methodology" details the vulnerabilities and threats that likely compromised the initial survey attempt (eg, anonymity and survey bots); the corresponding mitigation strategies and their benefits and limitations (eg, personal record verification platforms, IP address matching, virtual private network detection services, and CAPTCHA [Completely Automated Public Turing test to tell Computers and Humans Apart]); and the array of strategies that were implemented in future survey attempts. "Staying Vigilant" recounts the identification and management of an additional threat that emerged despite the implementation of an array of mitigation strategies, underscoring the need for ongoing vigilance and adaptability. While the precise nature of the threat remains unknown, the evidence suggested multiple fraudulent surveys were submitted by a single or connected entities, who likely did not possess e-cigarettes. To mitigate the chance of reoccurrence, participants were required to submit an authentic photo of their most used e-cigarette. Finally, in "Reflection 4 Years Later," we share insights after completing 5 waves of data collection without additional threats or vulnerabilities uncovered that necessitated the application of further mitigation strategies. Reflections include reasons for confidence in the data's integrity, the scalability and cost-effectiveness of the study protocols, and the potential introduction of sampling bias through recruitment and mitigation strategies. By sharing our journey, we aim to provide valuable insights for researchers facing similar challenges with web-based surveys and those seeking to minimize such challenges a priori. Our experiences highlight the importance of proactive measures, continuous monitoring, and adaptive problem-solving to ensure the integrity of data collected from participants recruited from web-based platforms.
2019年,我们发起了一项针对经常使用电子烟的成年人的基于网络的纵向调查,即电子烟使用与模式研究(VAPER研究)。由于调查提交存在欺诈行为,可能是由调查机器人和其他调查参与者提交的,首次收集调查数据的尝试失败了。本文记录了从那次挫折到成功完成5轮数据收集的历程。“天真的开端”部分审视了研究准备阶段,确定了导致失败的事件、决策和假设(例如,允许匿名调查参与者提交调查,以及过度依赖第三方的专有欺诈检测工具来识别试图提交多份调查的参与者)。“五级警报火灾及后续调查”总结了在初始调查启动后表明欺诈性调查提交损害了数据完整性的警示信号(例如,招募意外加速,以及一条语音邮件称收到多个礼品码存在欺诈行为)。本节还涵盖了调查过程,以及关于方法如何被利用的结论(例如,清除cookies和使用虚拟专用网络)和问题的严重程度(即,在1624份调查完成中,只有363份,22.4%可能是有效的)。“构建更具弹性的方法”详细介绍了可能损害初始调查尝试的漏洞和威胁(例如,匿名性和调查机器人);相应的缓解策略及其优缺点(例如,个人记录验证平台、IP地址匹配、虚拟专用网络检测服务和验证码[全自动区分计算机和人类的图灵测试]);以及在未来调查尝试中实施的一系列策略。“保持警惕”讲述了尽管实施了一系列缓解策略但仍出现的另一个威胁的识别和管理,强调了持续警惕和适应性的必要性。虽然威胁的确切性质尚不清楚,但证据表明,多份欺诈性调查是由单个或关联实体提交的,这些实体可能不拥有电子烟。为了降低再次发生的可能性,要求参与者提交他们最常用的电子烟的真实照片。最后,在“四年后的反思”中,我们分享了在完成5轮数据收集后没有发现需要应用进一步缓解策略的其他威胁或漏洞的见解。反思包括对数据完整性有信心的原因、研究方案的可扩展性和成本效益,以及通过招募和缓解策略可能引入的抽样偏差。通过分享我们的历程,我们旨在为面临基于网络调查类似挑战的研究人员以及那些试图事先将此类挑战降至最低的研究人员提供有价值的见解。我们的经验凸显了积极措施、持续监测和适应性问题解决对于确保从基于网络平台招募的参与者收集的数据完整性的重要性。