Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
Department of Global Health and Social Medicine, Harvard Medical School, Boston, Massachusetts, USA.
Stat Med. 2021 Aug 15;40(18):4090-4107. doi: 10.1002/sim.9016. Epub 2021 Jun 2.
In public health research, finite resources often require that decisions be made at the study design stage regarding which individuals to sample for detailed data collection. At the same time, when study units are naturally clustered, as patients are in clinics, it may be preferable to sample clusters rather than the study units, especially when the costs associated with travel between clusters are high. In this setting, aggregated data on the outcome and select covariates are sometimes routinely available through, for example, a country's Health Management Information System. If used wisely, this information can be used to guide decisions regarding which clusters to sample, and potentially obtain gains in efficiency over simple random sampling. In this article, we derive a series of formulas for optimal allocation of resources when a single-stage stratified cluster-based outcome-dependent sampling design is to be used and a marginal mean model is specified to answer the question of interest. Specifically, we consider two settings: (i) when a particular parameter in the mean model is of primary interest; and, (ii) when multiple parameters are of interest. We investigate the finite population performance of the optimal allocation framework through a comprehensive simulation study. Our results show that there are trade-offs that must be considered at the design stage: optimizing for one parameter yields efficiency gains over balanced and simple random sampling, while resulting in losses for the other parameters in the model. Optimizing for all parameters simultaneously yields smaller gains in efficiency, but mitigates the losses for the other parameters in the model.
在公共卫生研究中,由于资源有限,通常需要在研究设计阶段做出决策,决定对哪些个体进行详细数据收集抽样。与此同时,当研究单位自然聚集时,如在诊所中的患者,对聚类进行抽样可能比对研究单位进行抽样更为可取,尤其是当集群之间的旅行成本较高时。在这种情况下,通过国家健康管理信息系统等方式,通常可以获得有关结局和选择协变量的汇总数据。如果明智地使用这些信息,可以用于指导关于对哪些聚类进行抽样的决策,并有可能获得相对于简单随机抽样的效率提高。在本文中,我们推导了当使用单阶段分层基于结果的聚类依赖抽样设计时资源最优分配的一系列公式,并指定了边缘均值模型来回答感兴趣的问题。具体来说,我们考虑了两种情况:(i)当均值模型中的特定参数是主要关注点时;和,(ii)当多个参数都感兴趣时。我们通过全面的模拟研究来研究最优分配框架的有限总体性能。我们的结果表明,在设计阶段必须考虑到权衡取舍:对一个参数进行优化会提高效率,超过平衡和简单随机抽样,同时对模型中的其他参数造成损失。同时优化所有参数会带来较小的效率提高,但会减轻模型中其他参数的损失。