用于分析聚类和纵向数据的条件广义估计方程。

Conditional generalized estimating equations for the analysis of clustered and longitudinal data.

作者信息

Goetgeluk Sylvie, Vansteelandt Stijn

机构信息

Department of Applied Mathematics and Computer Sciences, Ghent University, Krijgslaan 281 S9, 9000 Ghent, Belgium.

出版信息

Biometrics. 2008 Sep;64(3):772-780. doi: 10.1111/j.1541-0420.2007.00944.x. Epub 2007 Nov 19.

DOI:10.1111/j.1541-0420.2007.00944.x

PMID:18047524

Abstract

A common and important problem in clustered sampling designs is that the effect of within-cluster exposures (i.e., exposures that vary within clusters) on outcome may be confounded by both measured and unmeasured cluster-level factors (i.e., measurements that do not vary within clusters). When some of these are ill/not accounted for, estimation of this effect through population-averaged models or random-effects models may introduce bias. We accommodate this by developing a general theory for the analysis of clustered data, which enables consistent and asymptotically normal estimation of the effects of within-cluster exposures in the presence of cluster-level confounders. Semiparametric efficient estimators are obtained by solving so-called conditional generalized estimating equations. We compare this approach with a popular proposal by Neuhaus and Kalbfleisch (1998, Biometrics 54, 638-645) who separate the exposure effect into a within- and a between-cluster component within a random intercept model. We find that the latter approach yields consistent and efficient estimators when the model is linear, but is less flexible in terms of model specification. Under nonlinear models, this approach may yield inconsistent and inefficient estimators, though with little bias in most practical settings.

摘要

整群抽样设计中一个常见且重要的问题是，群内暴露（即群内变化的暴露因素）对结局的影响可能会受到已测量和未测量的群水平因素（即群内不变的测量因素）的混杂。当其中一些因素未得到妥善处理或未被考虑到时，通过总体平均模型或随机效应模型对这种影响进行估计可能会引入偏差。我们通过开发一种用于分析整群数据的通用理论来解决这个问题，该理论能够在存在群水平混杂因素的情况下，对群内暴露的影响进行一致且渐近正态的估计。半参数有效估计量是通过求解所谓的条件广义估计方程得到的。我们将这种方法与Neuhaus和Kalbfleisch（1998年，《生物统计学》54卷，638 - 645页）提出的一种流行方法进行比较，他们在随机截距模型中将暴露效应分解为群内和群间两个部分。我们发现，当模型为线性时，后一种方法能产生一致且有效的估计量，但在模型设定方面灵活性较差。在非线性模型下，这种方法可能会产生不一致且低效的估计量，不过在大多数实际情况下偏差较小。