Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Trials. 2021 Dec 27;22(1):959. doi: 10.1186/s13063-021-05900-7.
Clustered or correlated outcome data is common in medical research studies, such as the analysis of national or international disease registries, or cluster-randomized trials, where groups of trial participants, instead of each trial participant, are randomized to interventions. Within-group correlation in studies with clustered data requires the use of specific statistical methods, such as generalized estimating equations and mixed-effects models, to account for this correlation and support unbiased statistical inference.
We compare different approaches to estimating generalized estimating equations and mixed effects models for a continuous outcome in R through a simulation study and a data example. The methods are implemented through four popular functions of the statistical software R, "geese", "gls", "lme", and "lmer". In the simulation study, we compare the mean squared error of estimating all the model parameters and compare the coverage proportion of the 95% confidence intervals. In the data analysis, we compare estimation of the intervention effect and the intra-class correlation.
In the simulation study, the function "lme" takes the least computation time. There is no difference in the mean squared error of the four functions. The "lmer" function provides better coverage of the fixed effects when the number of clusters is small as 10. The function "gls" produces close to nominal scale confidence intervals of the intra-class correlation. In the data analysis and the "gls" function yields a positive estimate of the intra-class correlation while the "geese" function gives a negative estimate. Neither of the confidence intervals contains the value zero.
The "gls" function efficiently produces an estimate of the intra-class correlation with a confidence interval. When the within-group correlation is as high as 0.5, the confidence interval is not always obtainable.
在医学研究中,常出现聚集性或相关性的结局数据,例如对国家或国际疾病登记处,或整群随机试验的分析,其中试验参与者的群体而不是每个参与者被随机分配到干预措施。在具有聚类数据的研究中,组内相关性需要使用特定的统计方法,例如广义估计方程和混合效应模型,以考虑这种相关性并支持无偏的统计推断。
我们通过模拟研究和数据示例比较了在 R 中使用不同方法估计连续结局的广义估计方程和混合效应模型。这些方法通过统计软件 R 的四个流行函数“geese”、“gls”、“lme”和“lmer”来实现。在模拟研究中,我们比较了估计所有模型参数的均方误差,并比较了 95%置信区间的覆盖比例。在数据分析中,我们比较了干预效果和组内相关的估计。
在模拟研究中,“lme”函数的计算时间最短。四个函数的均方误差没有差异。当聚类数为 10 时,“lmer”函数对固定效应的覆盖率更好。“gls”函数产生接近名义尺度的组内相关置信区间。在数据分析中,“gls”函数产生正的组内相关估计,而“geese”函数产生负的组内相关估计。置信区间都不包含零值。
“gls”函数有效地产生了组内相关的估计值和置信区间。当组内相关性高达 0.5 时,置信区间不一定可得。