一项关于聚类数据分析的 R 函数的比较研究。

A comparative study of R functions for clustered data analysis.

机构信息

Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

Trials. 2021 Dec 27;22(1):959. doi: 10.1186/s13063-021-05900-7.

DOI:10.1186/s13063-021-05900-7

PMID:34961539

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8711156/

Abstract

BACKGROUND

Clustered or correlated outcome data is common in medical research studies, such as the analysis of national or international disease registries, or cluster-randomized trials, where groups of trial participants, instead of each trial participant, are randomized to interventions. Within-group correlation in studies with clustered data requires the use of specific statistical methods, such as generalized estimating equations and mixed-effects models, to account for this correlation and support unbiased statistical inference.

METHODS

We compare different approaches to estimating generalized estimating equations and mixed effects models for a continuous outcome in R through a simulation study and a data example. The methods are implemented through four popular functions of the statistical software R, "geese", "gls", "lme", and "lmer". In the simulation study, we compare the mean squared error of estimating all the model parameters and compare the coverage proportion of the 95% confidence intervals. In the data analysis, we compare estimation of the intervention effect and the intra-class correlation.

RESULTS

In the simulation study, the function "lme" takes the least computation time. There is no difference in the mean squared error of the four functions. The "lmer" function provides better coverage of the fixed effects when the number of clusters is small as 10. The function "gls" produces close to nominal scale confidence intervals of the intra-class correlation. In the data analysis and the "gls" function yields a positive estimate of the intra-class correlation while the "geese" function gives a negative estimate. Neither of the confidence intervals contains the value zero.

CONCLUSIONS

The "gls" function efficiently produces an estimate of the intra-class correlation with a confidence interval. When the within-group correlation is as high as 0.5, the confidence interval is not always obtainable.

摘要

背景

在医学研究中，常出现聚集性或相关性的结局数据，例如对国家或国际疾病登记处，或整群随机试验的分析，其中试验参与者的群体而不是每个参与者被随机分配到干预措施。在具有聚类数据的研究中，组内相关性需要使用特定的统计方法，例如广义估计方程和混合效应模型，以考虑这种相关性并支持无偏的统计推断。

方法

我们通过模拟研究和数据示例比较了在 R 中使用不同方法估计连续结局的广义估计方程和混合效应模型。这些方法通过统计软件 R 的四个流行函数“geese”、“gls”、“lme”和“lmer”来实现。在模拟研究中，我们比较了估计所有模型参数的均方误差，并比较了 95%置信区间的覆盖比例。在数据分析中，我们比较了干预效果和组内相关的估计。

结果

在模拟研究中，“lme”函数的计算时间最短。四个函数的均方误差没有差异。当聚类数为 10 时，“lmer”函数对固定效应的覆盖率更好。“gls”函数产生接近名义尺度的组内相关置信区间。在数据分析中，“gls”函数产生正的组内相关估计，而“geese”函数产生负的组内相关估计。置信区间都不包含零值。

结论

“gls”函数有效地产生了组内相关的估计值和置信区间。当组内相关性高达 0.5 时，置信区间不一定可得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58a1/8711156/31343fc335a0/13063_2021_5900_Fig1_HTML.jpg

相似文献

A comparative study of R functions for clustered data analysis.

Trials. 2021 Dec 27;22(1):959. doi: 10.1186/s13063-021-05900-7.

A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE-type marginal model for binary outcomes.

Clin Trials. 2019 Feb;16(1):41-51. doi: 10.1177/1740774518803635. Epub 2018 Oct 8.

GEEMAEE: A SAS macro for the analysis of correlated outcomes based on GEE and finite-sample adjustments with application to cluster randomized trials.

Comput Methods Programs Biomed. 2023 Mar;230:107362. doi: 10.1016/j.cmpb.2023.107362. Epub 2023 Jan 20.

Performance of mixed effects models and generalized estimating equations for continuous outcomes in partially clustered trials including both independent and paired data.

Stat Med. 2024 Nov 10;43(25):4819-4835. doi: 10.1002/sim.10201. Epub 2024 Sep 4.

Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study.

Clin Trials. 2016 Aug;13(4):445-9. doi: 10.1177/1740774516643498. Epub 2016 Apr 19.

ORTH.Ord: An R package for analyzing correlated ordinal outcomes using alternating logistic regressions with orthogonalized residuals.

Comput Methods Programs Biomed. 2023 Jul;237:107567. doi: 10.1016/j.cmpb.2023.107567. Epub 2023 Apr 29.

Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations.

Stat Med. 2020 Sep 20;39(21):2779-2792. doi: 10.1002/sim.8575. Epub 2020 Jun 23.

Informative cluster size in cluster-randomised trials: A case study from the TRIGGER trial.

Clin Trials. 2023 Dec;20(6):661-669. doi: 10.1177/17407745231186094. Epub 2023 Jul 13.

Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials.

Stat Methods Med Res. 2017 Apr;26(2):583-597. doi: 10.1177/0962280214552092. Epub 2014 Sep 29.

GEECORR: A SAS macro for regression models of correlated binary responses and within-cluster correlation using generalized estimating equations.

Comput Methods Programs Biomed. 2021 Sep;208:106276. doi: 10.1016/j.cmpb.2021.106276. Epub 2021 Jul 14.

本文引用的文献

Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview.

Stat Methods Med Res. 2021 Feb;30(2):612-639. doi: 10.1177/0962280220932962. Epub 2020 Jul 6.

Effectiveness of a scalable group-based education and monitoring program, delivered by health workers, to improve control of hypertension in rural India: A cluster randomised controlled trial.

PLoS Med. 2020 Jan 2;17(1):e1002997. doi: 10.1371/journal.pmed.1002997. eCollection 2020 Jan.

Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

BMJ Open. 2017 Nov 15;7(11):e017151. doi: 10.1136/bmjopen-2017-017151.

Cluster randomized trials with a small number of clusters: which analyses should be used?

Int J Epidemiol. 2018 Feb 1;47(1):321-331. doi: 10.1093/ije/dyx169.

Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis.

Am J Public Health. 2017 Jul;107(7):1078-1086. doi: 10.2105/AJPH.2017.303707. Epub 2017 May 18.

An imbalance in cluster sizes does not lead to notable loss of power in cross-sectional, stepped-wedge cluster randomised trials with a continuous outcome.

Trials. 2017 Mar 7;18(1):109. doi: 10.1186/s13063-017-1832-8.

Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study.

Trials. 2016 Sep 6;17(1):438. doi: 10.1186/s13063-016-1571-2.

Stepped wedge cluster randomised trials: a review of the statistical methodology used and available.

BMC Med Res Methodol. 2016 Jun 6;16:69. doi: 10.1186/s12874-016-0176-5.

Sample size calculation for a stepped wedge trial.

Trials. 2015 Aug 17;16:354. doi: 10.1186/s13063-015-0840-9.

Stepped wedge randomised controlled trials: systematic review of studies published between 2010 and 2014.

Trials. 2015 Aug 17;16:353. doi: 10.1186/s13063-015-0839-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一项关于聚类数据分析的 R 函数的比较研究。

A comparative study of R functions for clustered data analysis.

机构信息

Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.