From the Harvard T.H. Chan School of Public Health, Boston, MA.
Epidemiology. 2018 Jan;29(1):50-57. doi: 10.1097/EDE.0000000000000763.
In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.
在资源有限的情况下,对国家抗逆转录病毒治疗 (ART) 项目的长期评估通常依赖于汇总数据,而对这些数据的分析可能存在生态偏差。随着研究人员和政策制定者考虑评估治疗依从性或死亡率等个体水平的结果,众所周知的病例对照设计在效率上优于随机抽样,因此具有吸引力。在本文所依据的背景下,有效估计和推断需要承认任何聚类现象,尽管据我们所知,对于基础人群存在聚类的病例对照数据,还没有发表过用于分析的统计方法。此外,在马拉维正在进行的合作的具体背景下,与在所有诊所进行病例对照抽样相比,建议在诊所内进行病例对照抽样,因为这是一种更实用的策略。据我们所知,尽管文献中已经描述了类似的基于结果的抽样方案,但针对相关数据设置的病例对照设计是新的。在本文中,我们描述了这种设计,讨论了平衡与非平衡抽样技术,并提供了一种基于逆概率加权广义估计方程分析聚类相关环境中病例对照研究的一般方法。推断基于稳健的三明治估计量,并估计相关参数,以确保适当考虑基于结果的抽样方案。我们进行了全面的模拟,部分基于 2005 年至 2007 年在马拉维的一个 N = 78155 名项目登记者样本的真实数据,以评估小样本的操作特征和与标准病例对照抽样相关的潜在权衡,或当病例对照抽样在聚类中进行时。