Panageas Katherine S, Schrag Deborah, Russell Localio A, Venkatraman E S, Begg Colin B
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA.
Stat Med. 2007 Apr 30;26(9):2017-35. doi: 10.1002/sim.2657.
In recent years health services researchers have conducted 'volume-outcome' studies to evaluate whether providers (hospitals or surgeons) who treat many patients for a specialized condition have better outcomes than those that treat few patients. These studies and the inherent clustering of events by provider present an unusual statistical problem. The volume-outcome setting is unique in that 'volume' reflects both the primary factor under study and also the cluster size. Consequently, the assumptions inherent in the use of available methods that correct for clustering might be violated in this setting. To address this issue, we investigate via simulation the properties of three estimation procedures for the analysis of cluster correlated data, specifically in the context of volume-outcome studies. We examine and compare the validity and efficiency of widely-available statistical techniques that have been used in the context of volume-outcome studies: generalized estimating equations (GEE) using both the independence and exchangeable correlation structures; random effects models; and the weighted GEE approach proposed by Williamson et al. (Biometrics 2003; 59:36-42) to account for informative clustering. Using data generated either from an underlying true random effects model or a cluster correlated model we show that both the random effects and the GEE with an exchangeable correlation structure have generally good properties, with relatively low bias for estimating the volume parameter and its variance. By contrast, the cluster weighted GEE method is inefficient.
近年来,卫生服务研究人员开展了“量-效”研究,以评估针对某一特定病症治疗大量患者的医疗服务提供者(医院或外科医生)是否比治疗少量患者的提供者取得更好的治疗效果。这些研究以及按医疗服务提供者对事件进行的固有聚类带来了一个特殊的统计问题。量-效研究环境的独特之处在于,“量”既反映了正在研究的主要因素,也反映了聚类规模。因此,在这种情况下,使用现有聚类校正方法时所固有的假设可能会被违背。为解决这一问题,我们通过模拟研究了三种用于分析聚类相关数据的估计程序的性质,特别是在量-效研究的背景下。我们检验并比较了在量-效研究中广泛使用的统计技术的有效性和效率:使用独立相关结构和可交换相关结构的广义估计方程(GEE);随机效应模型;以及威廉姆森等人(《生物统计学》,2003年;59:36 - 42)提出的加权GEE方法,以考虑信息性聚类。使用从潜在的真实随机效应模型或聚类相关模型生成的数据,我们表明随机效应模型以及具有可交换相关结构的GEE通常具有良好的性质,在估计量参数及其方差时偏差相对较低。相比之下,聚类加权GEE方法效率较低。