Parker Jayelin N, Rager Theresa L, Burns Jade, Mmeje Okeoma
Department of Obstetrics and Gynecology, University of Michigan, 1500 E Medical Center Dr, Ann Arbor, MI, 48109, United States, 1 734-763-3429, 1 734-647-9727.
Department of Health Behavior and Biological Sciences, University of Michigan School of Nursing, Ann Arbor, MI, United States.
JMIR Form Res. 2024 Dec 9;8:e56788. doi: 10.2196/56788.
As technology continues to shape the landscape of health research, the utilization of web-based surveys for collecting sexual health information among adolescents and young adults has become increasingly prevalent. However, this shift toward digital platforms brings forth a new set of challenges, particularly the infiltration of automated bots that can compromise data integrity and the reliability of survey results.
We aimed to outline the data verification process used in our study design, which employed survey programming and data cleaning protocols.
A 26-item survey was developed and programmed with several data integrity functions, including reCAPTCHA scores, RelevantID fraud and duplicate scores, verification of IP addresses, and honeypot questions. Participants aged 15-24 years were recruited via social media advertisements over 7 weeks and received a US $15 incentive after survey completion. Data verification occurred through a 2-part cleaning process, which removed responses that were incomplete, flagged as spam by Qualtrics, or from duplicate IP addresses, or those that did not meet the inclusion criteria. Final comparisons of reported age with date of birth and reported state with state inclusion criteria were performed. Participants who completed the study survey were linked to a second survey to receive their incentive. Responses without first and last names and full addresses were removed, as were those with duplicate IP addresses or the exact same longitude and latitude coordinates. Finally, IP addresses used to complete both surveys were compared, and consistent responses were eligible for an incentive.
Over 7 weeks, online advertisements for a web-based survey reached 1.4 million social media users. Of the 20,585 survey responses received, 4589 (22.3%) were verified. Incentives were sent to 462 participants; of these, 14 responses were duplicates and 3 contained discrepancies, resulting in a final sample of 445 responses.
Confidential web-based surveys are an appealing method for reaching populations-particularly adolescents and young adults, who may be reluctant to disclose sensitive information to family, friends, or clinical providers. Web-based surveys are a useful tool for researchers targeting hard-to-reach populations due to the difficulty in obtaining a representative sample. However, researchers face the ongoing threat of bots and fraudulent participants in a technology-driven world, necessitating the adoption of evolving bot detection software and tailored protocols for data collection in unique contexts.
随着技术不断塑造健康研究的格局,利用基于网络的调查来收集青少年和青年的性健康信息变得越来越普遍。然而,这种向数字平台的转变带来了一系列新挑战,尤其是自动化机器人的渗透,这可能会损害数据完整性和调查结果的可靠性。
我们旨在概述我们研究设计中使用的数据验证过程,该过程采用了调查编程和数据清理协议。
开发了一项包含26个条目的调查,并通过几个数据完整性功能进行编程,包括谷歌验证码分数、相关ID欺诈和重复分数、IP地址验证以及蜜罐问题。通过社交媒体广告在7周内招募了15至24岁的参与者,调查完成后给予15美元的奖励。数据验证通过两部分清理过程进行,该过程删除了不完整的回复、被Qualtrics标记为垃圾邮件的回复、来自重复IP地址的回复或不符合纳入标准的回复。对报告的年龄与出生日期以及报告的州与州纳入标准进行了最终比较。完成研究调查的参与者被链接到第二项调查以获得奖励。没有姓名和完整地址的回复以及具有重复IP地址或完全相同的经度和纬度坐标的回复都被删除。最后,比较了用于完成两项调查的IP地址,一致的回复才有资格获得奖励。
在7周内,一项基于网络的调查的在线广告覆盖了140万社交媒体用户。在收到的20585份调查回复中,4589份(22.3%)经过了验证。向462名参与者发送了奖励;其中,14份回复是重复的,3份包含差异,最终样本为445份回复。
基于网络的保密调查是接触人群的一种有吸引力的方法,特别是青少年和青年,他们可能不愿意向家人、朋友或临床提供者披露敏感信息。基于网络的调查对于针对难以接触到的人群的研究人员来说是一个有用的工具,因为难以获得具有代表性的样本。然而,在技术驱动的世界中,研究人员面临着机器人和欺诈性参与者的持续威胁,因此需要采用不断发展的机器人检测软件和针对独特背景的数据收集定制协议。