Twisk Jos W R
Department of Clinical Epidemiology and Biostatistics, EMGO-institute, Vrije Universiteit medical centre (VUmc), Amsterdam, The Netherlands.
Eur J Epidemiol. 2004;19(8):769-76. doi: 10.1023/b:ejep.0000036572.00663.f2.
The analysis of data from longitudinal studies requires special techniques, which take into account the fact that the repeated measurements within one individual are correlated. In this paper, the two most commonly used techniques to analyze longitudinal data are compared: generalized estimating equations (GEE) and random coefficient analysis. Both techniques were used to analyze a longitudinal dataset with six measurements on 147 subjects. The purpose of the example was to analyze the relationship between serum cholesterol and four predictor variables, i.e., physical fitness at baseline, body fatness (measured by sum of the thickness of four skinfolds), smoking and gender. The results showed that for a continuous outcome variable, GEE and random coefficient analysis gave comparable results, i.e., GEE-analysis with an exchangeable correlation structure and random coefficient analysis with only a random intercept were the same. There was also no difference between both techniques in the analysis of a dataset with missing data, even when the missing data was highly selective on earlier observed data. For a dichotomous outcome variable, the magnitude of the regression coefficients and standard errors was higher when calculated with random coefficient analysis then when calculated with GEE-analysis. Analysis of a dataset with missing data with a dichotomous outcome variable showed unpredictable results for both GEE and random coefficient analysis. It can be concluded that for a continuous outcome variable, GEE and random coefficient analysis are comparable. Longitudinal data-analysis with dichotomous outcome variables should, however, be interpreted with caution, especially when there are missing data.
对纵向研究数据的分析需要特殊技术,这些技术要考虑到同一个体内部重复测量值之间存在相关性这一事实。本文比较了两种最常用的分析纵向数据的技术:广义估计方程(GEE)和随机系数分析。这两种技术都用于分析一个包含147名受试者的六项测量值的纵向数据集。该示例的目的是分析血清胆固醇与四个预测变量之间的关系,即基线时的体能、体脂(通过四个皮褶厚度之和测量)、吸烟情况和性别。结果表明,对于连续的结局变量,GEE和随机系数分析给出了可比的结果,即具有可交换相关结构的GEE分析和仅具有随机截距的随机系数分析是相同的。在分析存在缺失数据的数据集时,即使缺失数据对早期观测数据具有高度选择性,两种技术之间也没有差异。对于二分结局变量,用随机系数分析计算时回归系数和标准误的大小比用GEE分析计算时更高。对具有二分结局变量且存在缺失数据的数据集进行分析时,GEE和随机系数分析都显示出不可预测的结果。可以得出结论,对于连续结局变量,GEE和随机系数分析具有可比性。然而,对具有二分结局变量的纵向数据分析应谨慎解释,尤其是当存在缺失数据时。