Gail M H, Mark S D, Carroll R J, Green S B, Pee D
National Cancer Institute, Division of Cancer Etiology, Bethesda, MD 20892-7368, USA.
Stat Med. 1996 Jun 15;15(11):1069-92. doi: 10.1002/(SICI)1097-0258(19960615)15:11<1069::AID-SIM220>3.0.CO;2-Q.
This paper discusses design considerations and the role of randomization-based inference in randomized community intervention trials. We stress that longitudinal follow-up of cohorts within communities often yields useful information on the effects of intervention on individuals, whereas cross-sectional surveys can usefully assess the impact of intervention on group indices of health. We also discuss briefly special design considerations, such as sampling cohorts from targeted subpopulations (for example, heavy smokers), matching the communities, calculating sample size, and other practical issues. We present randomization tests for matched and unmatched cohort designs. As is well known, these tests necessarily have proper size under the strong null hypothesis that treatment has no effect on any community response. It is less well known, however, that the size of randomization tests can exceed nominal levels under the 'weak' null hypothesis that intervention does not affect the average community response. Because this weak null hypothesis is of interest in community intervention trials, we study the size of randomization tests by simulation under conditions in which the weak null hypothesis holds but the strong null hypothesis does not. In unmatched studies, size may exceed nominal levels under the weak null hypothesis if there are more intervention than control communities and if the variance among community responses is larger among control communities than among intervention communities; size may also exceed nominal levels if there are more control than intervention communities and if the variance among community responses is larger among intervention communities. Otherwise, size is likely near nominal levels. To avoid such problems, we recommend use of the same numbers of control and intervention communities in unmatched designs. Pair-matched designs usually have size near nominal levels, even under the weak null hypothesis. We have identified some extreme cases, unlikely to arise in practice, in which even the size of pair-matched studies can exceed nominal levels. These simulations, however, tend to confirm the robustness of randomization tests for matched and unmatched community intervention trials, particularly if the latter designs have equal numbers of intervention and control communities. We also describe adaptations of randomization tests to allow for covariate adjustment, missing data, and application to cross-sectional surveys. We show that covariate adjustment can increase power, but such power gains diminish as the random component of variation among communities increases, which corresponds to increasing intraclass correlation of responses within communities. We briefly relate our results to model-based methods of inference for community intervention trials that include hierarchical models such as an analysis of variance model with random community effects and fixed intervention effects. Although we have tailored this paper to the design of community intervention trials, many of the ideas apply to other experiments in which one allocates groups or clusters of subjects at random to intervention or control treatments.
本文讨论了随机社区干预试验中基于随机化的推断的设计考量及作用。我们强调,对社区内队列进行纵向随访往往能得出关于干预对个体影响的有用信息,而横断面调查则能有效评估干预对群体健康指标的影响。我们还简要讨论了特殊的设计考量,比如从目标亚人群(如重度吸烟者)中抽取队列、社区匹配、计算样本量以及其他实际问题。我们给出了匹配和未匹配队列设计的随机化检验方法。众所周知,在治疗对任何社区反应均无影响这一强零假设下,这些检验必然具有恰当的检验水平。然而,鲜为人知的是,在干预不影响社区平均反应这一“弱”零假设下,随机化检验的检验水平可能会超过名义水平。由于这一弱零假设在社区干预试验中具有重要意义,我们通过模拟研究了在弱零假设成立但强零假设不成立的条件下随机化检验的检验水平。在未匹配研究中,如果干预社区比对照社区多,且对照社区内社区反应的方差大于干预社区内的方差,那么在弱零假设下检验水平可能会超过名义水平;如果对照社区比干预社区多,且干预社区内社区反应的方差大于对照社区内的方差,检验水平也可能会超过名义水平。否则,检验水平可能接近名义水平。为避免此类问题,我们建议在未匹配设计中使用相同数量的对照社区和干预社区。配对匹配设计通常检验水平接近名义水平,即使在弱零假设下也是如此。我们已经识别出一些极端情况(实际中不太可能出现),在这些情况下,即使是配对匹配研究的检验水平也可能会超过名义水平。然而,这些模拟往往证实了匹配和未匹配社区干预试验中随机化检验的稳健性,特别是如果后者设计中干预社区和对照社区数量相等。我们还描述了随机化检验的调整方法,以允许进行协变量调整、处理缺失数据以及应用于横断面调查。我们表明,协变量调整可以提高检验效能,但随着社区间变异的随机成分增加,这种效能提升会减弱,这与社区内反应的组内相关性增加相对应。我们简要地将我们的结果与社区干预试验的基于模型的推断方法联系起来,这些方法包括分层模型,如具有随机社区效应和固定干预效应的方差分析模型。尽管我们是针对社区干预试验的设计撰写本文,但许多观点也适用于其他实验,即随机将受试者组或集群分配到干预或对照处理的实验。