• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于识别网络调查数据中重复参与的层次聚类方法。

A hierarchical clustering approach to identify repeated enrollments in web survey data.

机构信息

Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA, United States of America.

Department of Medical Oncology, Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey, United States of America.

出版信息

PLoS One. 2018 Sep 25;13(9):e0204394. doi: 10.1371/journal.pone.0204394. eCollection 2018.

DOI:10.1371/journal.pone.0204394
PMID:30252908
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6155511/
Abstract

INTRODUCTION

Online surveys are a valuable tool for social science research, but the perceived anonymity provided by online administration may lead to problematic behaviors from study participants. Particularly, if a study offers incentives, some participants may attempt to enroll multiple times. We propose a method to identify clusters of non-independent enrollments in a web-based study, motivated by an analysis of survey data which tests the effectiveness of an online skin-cancer risk reduction program.

METHODS

To identify groups of enrollments, we used a hierarchical clustering algorithm based on the Euclidean distance matrix formed by participant responses to a series of Likert-type eligibility questions. We then systematically identified clusters that are unusual in terms of both size and similarity, by repeatedly simulating datasets from the empirical distribution of responses under the assumption of independent enrollments. By performing the clustering algorithm on the simulated datasets, we determined the distribution of cluster size and similarity under independence, which is then used to identify groups of outliers in the observed data. Next, we assessed 12 other quality indicators, including previously proposed and study-specific measures. We summarized the quality measures by cluster membership, and compared the cluster groupings to those found when using the quality indicators with latent class modeling.

RESULTS AND CONCLUSIONS

When we excluded the clustered enrollments and/or lower-quality latent classes from the analysis of study outcomes, the estimates of the intervention effect were larger. This demonstrates how including repeat or low quality participants can introduce bias into a web-based study. As much as is possible, web-based surveys should be designed to verify participant quality. Our method can be used to verify survey quality and identify problematic groups of enrollments when necessary.

摘要

简介

在线调查是社会科学研究的一种有价值的工具,但在线管理所提供的感知匿名性可能导致研究参与者的不良行为。特别是,如果研究提供奖励,一些参与者可能会试图多次注册。我们提出了一种方法来识别基于网络的研究中不独立的注册群体,这是受分析测试在线皮肤癌风险降低计划有效性的调查数据的启发。

方法

为了识别注册群体,我们使用了一种基于参与者对一系列李克特式资格问题的回答形成的欧几里得距离矩阵的层次聚类算法。然后,我们通过重复模拟数据集,从独立注册的假设下的响应经验分布中,系统地识别出大小和相似性都不寻常的集群。通过在模拟数据集上执行聚类算法,我们确定了独立性下的聚类大小和相似性的分布,然后将其用于识别观测数据中的异常组。接下来,我们评估了其他 12 个质量指标,包括之前提出的和特定于研究的指标。我们根据集群成员身份总结了质量指标,并将集群分组与使用质量指标和潜在类别建模找到的分组进行了比较。

结果与结论

当我们从研究结果的分析中排除聚类注册和/或低质量的潜在类别时,干预效果的估计值更大。这表明,包括重复或低质量的参与者会给基于网络的研究引入偏差。在可能的情况下,基于网络的调查应该设计用于验证参与者的质量。我们的方法可以用于验证调查质量,并在必要时识别有问题的注册群体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a34/6155511/cb5748183cf3/pone.0204394.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a34/6155511/23246e4496a8/pone.0204394.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a34/6155511/cb5748183cf3/pone.0204394.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a34/6155511/23246e4496a8/pone.0204394.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a34/6155511/cb5748183cf3/pone.0204394.g002.jpg

相似文献

1
A hierarchical clustering approach to identify repeated enrollments in web survey data.一种用于识别网络调查数据中重复参与的层次聚类方法。
PLoS One. 2018 Sep 25;13(9):e0204394. doi: 10.1371/journal.pone.0204394. eCollection 2018.
2
Using hierarchical cluster models to systematically identify groups of jobs with similar occupational questionnaire response patterns to assist rule-based expert exposure assessment in population-based studies.使用层次聚类模型系统地识别具有相似职业问卷回答模式的工作群组,以协助基于人群的研究中基于规则的专家暴露评估。
Ann Occup Hyg. 2015 May;59(4):455-66. doi: 10.1093/annhyg/meu101. Epub 2014 Dec 3.
3
Assessing the Effects of Participant Preference and Demographics in the Usage of Web-based Survey Questionnaires by Women Attending Screening Mammography in British Columbia.评估不列颠哥伦比亚省接受乳腺钼靶筛查的女性在使用基于网络的调查问卷时参与者偏好和人口统计学特征的影响。
J Med Internet Res. 2016 Mar 22;18(3):e70. doi: 10.2196/jmir.5068.
4
5
Interactive visual exploration and refinement of cluster assignments.聚类分配的交互式可视化探索与优化。
BMC Bioinformatics. 2017 Sep 12;18(1):406. doi: 10.1186/s12859-017-1813-7.
6
Completeness and Reliability of Location Data Collected on the Web: Assessing the Quality of Self-Reported Locations in an Internet Sample of Men Who Have Sex With Men.网络收集的位置数据的完整性和可靠性:评估男男性行为者互联网样本中自我报告位置的质量。
J Med Internet Res. 2016 Jun 9;18(6):e142. doi: 10.2196/jmir.5701.
7
Internet Gamblers Differ on Social Variables: A Latent Class Analysis.网民在社会变量上存在差异:潜在类别分析。
J Gambl Stud. 2017 Sep;33(3):881-897. doi: 10.1007/s10899-016-9664-0.
8
The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol.基于互联网的电子学习对临床医生行为和患者结局的有效性:一项系统评价方案。
JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919.
9
Methods to detect low quality data and its implication for psychological research.检测低质量数据的方法及其对心理学研究的影响。
Behav Res Methods. 2018 Dec;50(6):2586-2596. doi: 10.3758/s13428-018-1035-6.
10
Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。
Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

引用本文的文献

1
Metabolomic Characterisation of Discriminatory Metabolites Involved in Halo Blight Disease in Oat Cultivars Caused by pv. .燕麦品种中由丁香假单胞菌致晕疫病相关的鉴别代谢物的代谢组学特征分析 。 你提供的原文“pv. ”这里不完整,我是按照完整的理解来翻译的,你可检查下原文确保准确。
Metabolites. 2022 Mar 16;12(3):248. doi: 10.3390/metabo12030248.
2
Cost, reach, and representativeness of recruitment efforts for an online skin cancer risk reduction intervention trial for young adults.招募努力的成本、范围和代表性:一项针对年轻成年人的在线皮肤癌风险降低干预试验
Transl Behav Med. 2021 Oct 23;11(10):1875-1884. doi: 10.1093/tbm/ibab047.
3
Metabolomics for Biomarker Discovery: Key Signatory Metabolic Profiles for the Identification and Discrimination of Oat Cultivars.

本文引用的文献

1
An online skin cancer risk-reduction intervention for young adults: Mechanisms of effects.一项针对年轻人的在线降低皮肤癌风险干预措施:作用机制
Health Psychol. 2017 Mar;36(3):215-225. doi: 10.1037/hea0000420. Epub 2016 Nov 7.
2
Efficacy of an Intervention to Alter Skin Cancer Risk Behaviors in Young Adults.一项改变年轻成年人皮肤癌风险行为的干预措施的效果
Am J Prev Med. 2016 Jul;51(1):1-11. doi: 10.1016/j.amepre.2015.11.008. Epub 2016 Jan 22.
3
Simulating Ordinal Data.模拟有序数据。
用于生物标志物发现的代谢组学:用于燕麦品种鉴定和区分的关键标志性代谢谱
Metabolites. 2021 Mar 12;11(3):165. doi: 10.3390/metabo11030165.
Multivariate Behav Res. 2012 Jul;47(4):566-89. doi: 10.1080/00273171.2012.692630.
4
Strategies to address participant misrepresentation for eligibility in Web-based research.解决基于网络研究中参与者为符合资格而虚假陈述的策略。
Int J Methods Psychiatr Res. 2014 Mar;23(1):120-9. doi: 10.1002/mpr.1415. Epub 2014 Jan 16.
5
Identifying careless responses in survey data.识别调查数据中的粗心回答。
Psychol Methods. 2012 Sep;17(3):437-55. doi: 10.1037/a0028085. Epub 2012 Apr 16.
6
Can we explain why some people do and some people do not act on their intentions?我们能否解释为什么有些人会按照自己的意愿行事,而有些人则不会?
Psychol Health Med. 2003 Feb 1;8(1):3-18. doi: 10.1080/1354850021000059223.
7
Cost effectiveness of internet interventions: review and recommendations.互联网干预措施的成本效益:综述与建议。
Ann Behav Med. 2009 Aug;38(1):40-5. doi: 10.1007/s12160-009-9131-6.
8
Identifying multiple submissions in Internet research: preserving data integrity.识别互联网研究中的多重提交:维护数据完整性。
AIDS Behav. 2008 Nov;12(6):964-73. doi: 10.1007/s10461-007-9352-2. Epub 2008 Feb 1.
9
Pvclust: an R package for assessing the uncertainty in hierarchical clustering.Pvclust:一个用于评估层次聚类不确定性的R语言包。
Bioinformatics. 2006 Jun 15;22(12):1540-2. doi: 10.1093/bioinformatics/btl117. Epub 2006 Apr 4.
10
Development and reliability of a brief skin cancer risk assessment tool.一种简短的皮肤癌风险评估工具的开发与可靠性
Cancer Detect Prev. 2003;27(4):311-5. doi: 10.1016/s0361-090x(03)00094-1.