Ballard April M, Cardwell Trey, Young April M
Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, KY, United States.
Department of Environmental Health, Emory University, Atlanta, GA, United States.
JMIR Public Health Surveill. 2019 Feb 4;5(1):e12344. doi: 10.2196/12344.
Internet is becoming an increasingly common tool for survey research, particularly among "hidden" or vulnerable populations, such as men who have sex with men (MSM). Web-based research has many advantages for participants and researchers, but fraud can present a significant threat to data integrity.
The purpose of this analysis was to evaluate fraud detection strategies in a Web-based survey of young MSM and describe new protocols to improve fraud detection in Web-based survey research.
This study involved a cross-sectional Web-based survey that examined individual- and network-level risk factors for HIV transmission and substance use among young MSM residing in 15 counties in Central Kentucky. Each survey entry, which was at least 50% complete, was evaluated by the study staff for fraud using an algorithm involving 8 criteria based on a combination of geolocation data, survey data, and personal information. Entries were classified as fraudulent, potentially fraudulent, or valid. Descriptive analyses were performed to describe each fraud detection criterion among entries.
Of the 414 survey entries, the final categorization resulted in 119 (28.7%) entries identified as fraud, 42 (10.1%) as potential fraud, and 253 (61.1%) as valid. Geolocation outside of the study area (164/414, 39.6%) was the most frequently violated criterion. However, 33.3% (82/246) of the entries that had ineligible geolocations belonged to participants who were in eligible locations (as verified by their request to mail payment to an address within the study area or participation at a local event). The second most frequently violated criterion was an invalid phone number (94/414, 22.7%), followed by mismatching names within an entry (43/414, 10.4%) and unusual email addresses (37/414, 8.9%). Less than 5% (18/414) of the entries had some combination of personal information items matching that of a previous entry.
This study suggests that researchers conducting Web-based surveys of MSM should be vigilant about the potential for fraud. Researchers should have a fraud detection algorithm in place prior to data collection and should not rely on the Internet Protocol (IP) address or geolocation alone, but should rather use a combination of indicators.
互联网正日益成为调查研究中常用的工具,尤其是在“隐藏”或易受伤害的人群中,如同性恋男性(MSM)。基于网络的研究对参与者和研究者都有诸多优势,但欺诈行为会对数据完整性构成重大威胁。
本分析旨在评估一项针对年轻男男性行为者的网络调查中的欺诈检测策略,并描述在基于网络的调查研究中改进欺诈检测的新方案。
本研究采用基于网络的横断面调查,研究肯塔基州中部15个县的年轻男男性行为者中艾滋病病毒传播和药物使用的个体及网络层面风险因素。研究人员使用一种基于地理位置数据、调查数据和个人信息组合的包含8项标准的算法,对每份至少完成50%的调查问卷进行欺诈评估。条目被分类为欺诈、潜在欺诈或有效。进行描述性分析以描述各条目中的每项欺诈检测标准。
在414份调查问卷条目中,最终分类结果为119份(28.7%)条目被认定为欺诈,42份(10.1%)为潜在欺诈,253份(61.1%)为有效。研究区域外的地理位置(164/414,39.6%)是最常被违反的标准。然而,33.3%(82/246)地理位置不符合要求的条目属于身处符合要求地点的参与者(经他们要求将款项邮寄至研究区域内的地址或参与当地活动得以证实)。第二常被违反的标准是无效电话号码(94/414,22.7%),其次是条目中姓名不匹配(43/414,10.4%)和异常电子邮件地址(37/414,8.9%)。不到5%(18/414)的条目有某些个人信息项与之前条目匹配的情况。
本研究表明,对男男性行为者进行基于网络调查的研究人员应警惕欺诈的可能性。研究人员在数据收集前应制定欺诈检测算法,不应仅依赖互联网协议(IP)地址或地理位置,而应使用多种指标相结合的方式。