Huang Shuang, Fiero Mallorie H, Bell Melanie L
Departments of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
Departments of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA.
Clin Trials. 2016 Aug;13(4):445-9. doi: 10.1177/1740774516643498. Epub 2016 Apr 19.
BACKGROUND/AIMS: Generalized estimating equations are a common modeling approach used in cluster randomized trials to account for within-cluster correlation. It is well known that the sandwich variance estimator is biased when the number of clusters is small (≤40), resulting in an inflated type I error rate. Various bias correction methods have been proposed in the statistical literature, but how adequately they are utilized in current practice for cluster randomized trials is not clear. The aim of this study is to evaluate the use of generalized estimating equation bias correction methods in recently published cluster randomized trials and demonstrate the necessity of such methods when the number of clusters is small.
Review of cluster randomized trials published between August 2013 and July 2014 and using generalized estimating equations for their primary analyses. Two independent reviewers collected data from each study using a standardized, pre-piloted data extraction template. A two-arm cluster randomized trial was simulated under various scenarios to show the potential effect of a small number of clusters on type I error rate when estimating the treatment effect. The nominal level was set at 0.05 for the simulation study.
Of the 51 included trials, 28 (54.9%) analyzed 40 or fewer clusters with a minimum of four total clusters. Of these 28 trials, only one trial used a bias correction method for generalized estimating equations. The simulation study showed that with four clusters, the type I error rate ranged between 0.43 and 0.47. Even though type I error rate moved closer to the nominal level as the number of clusters increases, it still ranged between 0.06 and 0.07 with 40 clusters.
Our results showed that statistical issues arising from small number of clusters in generalized estimating equations is currently inadequately handled in cluster randomized trials. Potential for type I error inflation could be very high when the sandwich estimator is used without bias correction.
背景/目的:广义估计方程是聚类随机试验中常用的一种建模方法,用于处理聚类内相关性。众所周知,当聚类数量较少(≤40)时,三明治方差估计量存在偏差,会导致I型错误率膨胀。统计文献中已提出了各种偏差校正方法,但目前在聚类随机试验的实际应用中这些方法的使用情况尚不清楚。本研究的目的是评估广义估计方程偏差校正方法在最近发表的聚类随机试验中的应用情况,并证明当聚类数量较少时使用此类方法的必要性。
回顾2013年8月至2014年7月发表的聚类随机试验,并在其主要分析中使用广义估计方程。两名独立的评审员使用标准化的、预先试点的数据提取模板从每项研究中收集数据。在各种场景下模拟双臂聚类随机试验,以显示估计治疗效果时少量聚类对I型错误率的潜在影响。模拟研究的名义水平设定为0.05。
在纳入的51项试验中,28项(54.9%)分析了40个或更少的聚类,总共至少有4个聚类。在这28项试验中,只有一项试验对广义估计方程使用了偏差校正方法。模拟研究表明,当有4个聚类时,I型错误率在0.43至0.47之间。尽管随着聚类数量的增加,I型错误率更接近名义水平,但在有40个聚类时,其仍在0.06至0.07之间。
我们的结果表明,目前聚类随机试验中对广义估计方程中少量聚类引起的统计问题处理不足。在未进行偏差校正的情况下使用三明治估计量时,I型错误膨胀的可能性可能非常高。