Killip Shersten, Mahfoud Ziyad, Pearce Kevin
Department of Family Practice and Community Medicine, University of Kentucky, Lexington, KY, USA.
Ann Fam Med. 2004 May-Jun;2(3):204-8. doi: 10.1370/afm.141.
Primary care research often involves clustered samples in which subjects are randomized at a group level but analyzed at an individual level. Analyses that do not take this clustering into account may report significance where none exists. This article explores the causes, consequences, and implications of cluster data.
Using a case study with accompanying equations, we show that clustered samples are not as statistically efficient as simple random samples.
Similarity among subjects within preexisting groups or clusters reduces the variability of responses in a clustered sample, which erodes the power to detect true differences between study arms. This similarity is expressed by the intracluster correlation coefficient, or p (rho), which compares the within-group variance with the between-group variance. Rho is used in equations along with the cluster size and the number of clusters to calculate the effective sample size (ESS) in a clustered design. The ESS should be used to calculate power in the design phase of a clustered study. Appropriate accounting for similarities among subjects in a cluster almost always results in a net loss of power, requiring increased total subject recruitment. Increasing the number of clusters enhances power more efficiently than does increasing the number of subjects within a cluster.
Primary care research frequently uses clustered designs, whether consciously or unconsciously. Researchers must recognize and understand the implications of clusters to avoid costly sample size errors.
初级保健研究通常涉及整群抽样,即受试者在组水平上随机分组,但在个体水平上进行分析。未考虑这种聚类情况的分析可能会在不存在显著性差异的情况下报告有显著性差异。本文探讨了聚类数据的原因、后果及影响。
通过一个带有相关方程的案例研究,我们表明整群抽样在统计学效率上不如简单随机抽样。
预先存在的组或聚类内受试者之间的相似性降低了整群样本中反应的变异性,这削弱了检测研究组之间真正差异的能力。这种相似性由组内相关系数(或ρ)表示,它比较组内方差和组间方差。在方程中,ρ与聚类大小和聚类数量一起用于计算整群设计中的有效样本量(ESS)。在整群研究的设计阶段,应使用ESS来计算检验效能。适当地考虑聚类中受试者之间的相似性几乎总会导致检验效能的净损失,这就需要增加总的受试者招募数量。增加聚类数量比增加聚类内的受试者数量更有效地提高检验效能。
初级保健研究经常有意识或无意识地使用整群设计。研究人员必须认识并理解聚类的影响,以避免代价高昂的样本量错误。