Gile Krista J, Handcock Mark S
Nuffield College, University of Oxford.
Sociol Methodol. 2010 Aug;40(1):285-327. doi: 10.1111/j.1467-9531.2010.01223.x.
Respondent-Driven Sampling (RDS) employs a variant of a link-tracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample.The current estimators of population averages make strong assumptions in order to treat the data as a probability sample. We evaluate three critical sensitivities of the estimators: to bias induced by the initial sample, to uncontrollable features of respondent behavior, and to the without-replacement structure of sampling.Our analysis indicates: (1) that the convenience sample of seeds can induce bias, and the number of sample waves typically used in RDS is likely insufficient for the type of nodal mixing required to obtain the reputed asymptotic unbiasedness; (2) that preferential referral behavior by respondents leads to bias; (3) that when a substantial fraction of the target population is sampled the current estimators can have substantial bias.This paper sounds a cautionary note for the users of RDS. While current RDS methodology is powerful and clever, the favorable statistical properties claimed for the current estimates are shown to be heavily dependent on often unrealistic assumptions. We recommend ways to improve the methodology.
应答驱动抽样(RDS)采用了一种链接追踪网络抽样策略的变体,从难以接触到的人群中收集数据。通过追踪潜在社会网络中的链接,该过程利用社会结构来扩大样本规模并减少对初始(便利)样本的依赖。当前总体均值的估计方法做出了很强的假设,以便将数据视为概率样本。我们评估了估计方法的三个关键敏感性:对初始样本引起的偏差、对应答者行为不可控特征以及对无放回抽样结构的敏感性。我们的分析表明:(1)种子的便利样本可能会导致偏差,并且RDS中通常使用的样本轮次数量可能不足以实现获得所谓渐近无偏性所需的节点混合类型;(2)应答者的优先推荐行为会导致偏差;(3)当目标人群的很大一部分被抽样时,当前的估计方法可能会有很大偏差。本文为RDS的使用者敲响了警钟。虽然当前的RDS方法强大且巧妙,但当前估计所宣称的良好统计特性被证明严重依赖于往往不切实际的假设。我们推荐了改进该方法的途径。