Verdery Ashton M, Fisher Jacob C, Siripong Nalyn, Abdesselam Kahina, Bauldry Shawn
Pennsylvania State University.
Duke University.
Sociol Methodol. 2017 Aug;47(1):274-306. doi: 10.1177/0081175017716489. Epub 2017 Jul 6.
Respondent-driven sampling (RDS) is a popular method for sampling hard-to-survey populations that leverages social network connections through peer recruitment. While RDS is most frequently applied to estimate the prevalence of infections and risk behaviors of interest to public health, such as HIV/AIDS or condom use, it is rarely used to draw inferences about the structural properties of social networks among such populations because it does not typically collect the necessary data. Drawing on recent advances in computer science, we introduce a set of data collection instruments and RDS estimators for network clustering, an important topological property that has been linked to a network's potential for diffusion of information, disease, and health behaviors. We use simulations to explore how these estimators, originally developed for random walk samples of computer networks, perform when applied to RDS samples with characteristics encountered in realistic field settings that depart from random walks. In particular, we explore the effects of multiple seeds, without replacement versus with replacement, branching chains, imperfect response rates, preferential recruitment, and misreporting of ties. We find that clustering coefficient estimators retain desirable properties in RDS samples. This paper takes an important step toward calculating network characteristics using nontraditional sampling methods, and it expands the potential of RDS to tell researchers more about hidden populations and the social factors driving disease prevalence.
应答驱动抽样(RDS)是一种用于对难以调查的人群进行抽样的常用方法,它通过同伴招募利用社会网络联系。虽然RDS最常用于估计公共卫生领域感兴趣的感染率和风险行为,如艾滋病毒/艾滋病或避孕套使用情况,但它很少用于推断此类人群中社会网络的结构属性,因为它通常不收集必要的数据。借鉴计算机科学的最新进展,我们引入了一组用于网络聚类的数据收集工具和RDS估计器,网络聚类是一种重要的拓扑属性,与网络传播信息、疾病和健康行为的潜力相关。我们使用模拟来探索这些最初为计算机网络的随机游走样本开发的估计器,在应用于具有偏离随机游走的现实实地环境中遇到的特征的RDS样本时的表现。特别是,我们探讨了多个种子、不放回抽样与放回抽样、分支链、不完全应答率、优先招募以及关系误报的影响。我们发现聚类系数估计器在RDS样本中保留了理想的属性。本文朝着使用非传统抽样方法计算网络特征迈出了重要一步,并扩展了RDS的潜力,使研究人员能够更多地了解隐藏人群以及驱动疾病流行的社会因素。