Department of Global Health and Social Medicine, Harvard Medical School, Boston, Massachusetts, USA.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
Stat Med. 2023 Mar 30;42(7):917-935. doi: 10.1002/sim.9650. Epub 2023 Jan 17.
Cluster-based outcome-dependent sampling (ODS) has the potential to yield efficiency gains when the outcome of interest is relatively rare, and resource constraints allow only a certain number of clusters to be visited for data collection. Previous research has shown that when the intended analysis is inverse-probability weighted generalized estimating equations, and the number of clusters that can be sampled is fixed, optimal allocation of the (cluster-level) sample size across strata defined by auxiliary variables readily available at the design stage has the potential to increase efficiency in the estimation of the parameter(s) of interest. In such a setting, the optimal allocation formulae depend on quantities that are unknown in practice, currently making such designs difficult to implement. In this paper, we consider a two-wave adaptive sampling approach, in which data is collected from a first wave sample, and subsequently used to compute the optimal second wave stratum-specific sample sizes. We consider two strategies for estimating the necessary components using the first wave data: an inverse-probability weighting (IPW) approach and a multiple imputation (MI) approach. In a comprehensive simulation study, we show that the adaptive sampling approach performs well, and that the MI approach yields designs that are very near-optimal, regardless of the covariate type. The IPW approach, on the other hand, has mixed results. Finally, we illustrate the proposed adaptive sampling procedures with data on maternal characteristics and birth outcomes among women enrolled in the Safer Deliveries program in Zanzibar, Tanzania.
基于聚类的依结局抽样(ODS)在感兴趣的结局相对罕见且资源有限,只能对一定数量的群组进行数据收集时,具有提高效率的潜力。先前的研究表明,当拟进行的分析是逆概率加权广义估计方程,且可抽样的群组数量固定时,根据设计阶段即可获得的辅助变量定义的层内(群组层面)样本量的最佳分配,有可能提高感兴趣参数的估计效率。在这种情况下,最佳分配公式取决于实践中未知的数量,目前使得此类设计难以实施。在本文中,我们考虑了一种两波自适应抽样方法,其中从第一波样本中收集数据,然后将其用于计算最佳的第二波分层特定样本量。我们考虑了两种使用第一波数据估计必要成分的策略:逆概率加权(IPW)方法和多重插补(MI)方法。在全面的模拟研究中,我们表明自适应抽样方法表现良好,并且 MI 方法无论协变量类型如何,都能得到非常接近最优的设计。另一方面,IPW 方法的结果则喜忧参半。最后,我们使用坦桑尼亚桑给巴尔的更安全分娩项目中招募的妇女的母亲特征和生育结局数据说明了所提出的自适应抽样程序。