Suppr超能文献

多水平逻辑回归模型样本量的模拟研究

A simulation study of sample size for multilevel logistic regression models.

作者信息

Moineddin Rahim, Matheson Flora I, Glazier Richard H

机构信息

Department of Public Health Sciences, University of Toronto, Toronto, Canada.

出版信息

BMC Med Res Methodol. 2007 Jul 16;7:34. doi: 10.1186/1471-2288-7-34.

Abstract

BACKGROUND

Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions) and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates.

METHODS

In this paper simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models. In addition, the influence of prevalence of the outcome and the intra-class correlation coefficient (ICC) is examined.

RESULTS

The results show that the estimates of the fixed effect parameters are unbiased for 100 groups with group size of 50 or higher. The estimates of the variance covariance components are slightly biased even with 100 groups and group size of 50. The biases for both fixed and random effects are severe for group size of 5. The standard errors for fixed effect parameters are unbiased while for variance covariance components are underestimated. Results suggest that low prevalent events require larger sample sizes with at least a minimum of 100 groups and 50 individuals per group.

CONCLUSION

We recommend using a minimum group size of 50 with at least 50 groups to produce valid estimates for multi-level logistic regression models. Group size should be adjusted under conditions where the prevalence of events is low such that the expected number of events in each group should be greater than one.

摘要

背景

健康与社会科学领域开展的许多研究都收集个体层面的数据作为结果指标。通常,此类数据具有层次结构,患者聚集在医生群体中,而医生又聚集在医疗机构中。大型调查数据,包括全国性调查,也具有层次或聚类结构;受访者自然地聚集在地理区域(如健康区域)中,并且可能被进一步划分为更小的单位。许多领域中感兴趣的结果不仅反映连续测量指标,还包括二元结果,如抑郁、疾病的存在与否以及自我报告的总体健康状况。在多层次研究框架中,一个重要问题是计算出能产生无偏且准确估计值的合适样本量。

方法

本文采用模拟研究来评估个体和群体层面样本量变化对多层次逻辑回归模型参数估计和方差成分估计准确性的影响。此外,还考察了结果发生率和组内相关系数(ICC)的影响。

结果

结果表明,对于100个每组规模为50或更大的群体,固定效应参数的估计是无偏的。即使有100个每组规模为50的群体,方差协方差成分的估计也略有偏差。对于每组规模为5的情况,固定效应和随机效应的偏差都很严重。固定效应参数的标准误是无偏的,而方差协方差成分的标准误被低估。结果表明,低发生率事件需要更大的样本量,至少要有100个群体且每组至少50个个体。

结论

我们建议使用每组至少50个个体且至少50个群体的最小样本量,以对多层次逻辑回归模型产生有效的估计。在事件发生率较低的情况下,应调整群体规模,使每组的预期事件数大于1。

相似文献

1
A simulation study of sample size for multilevel logistic regression models.
BMC Med Res Methodol. 2007 Jul 16;7:34. doi: 10.1186/1471-2288-7-34.
2
When can group level clustering be ignored? Multilevel models versus single-level models with sparse data.
J Epidemiol Community Health. 2008 Aug;62(8):752-8. doi: 10.1136/jech.2007.060798.
5
Two-stage methods for the analysis of pooled data.
Stat Med. 2001 Jul 30;20(14):2115-30. doi: 10.1002/sim.852.
8
Meta-analysis of binary data: which within study variance estimate to use?
Stat Med. 2001 Jul 15;20(13):1947-56. doi: 10.1002/sim.823.
9
Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.
Paediatr Perinat Epidemiol. 2009 Jul;23(4):380-92. doi: 10.1111/j.1365-3016.2009.01046.x.
10

引用本文的文献

1
Aspectual reduplication in Sign Language of the Netherlands: reconsidering phonological constraints and aspectual distinctions.
Linguistics. 2024 Jun 6;63(1):193-245. doi: 10.1515/ling-2022-0076. eCollection 2025 Jan.
6
Voice biomarkers in middle and later adulthood as predictors of cognitive changes.
Front Psychol. 2024 Oct 18;15:1422376. doi: 10.3389/fpsyg.2024.1422376. eCollection 2024.
9
Patient adherence and response time in electronic patient-reported outcomes: insights from three longitudinal clinical trials.
Qual Life Res. 2024 Jun;33(6):1691-1706. doi: 10.1007/s11136-024-03644-w. Epub 2024 Apr 10.
10
Sex-Related Differences in Patient Characteristics, Hemodynamics, and Outcomes of Cardiogenic Shock: INOVA-SHOCK Registry.
J Soc Cardiovasc Angiogr Interv. 2023 Sep-Oct;2(5). doi: 10.1016/j.jscai.2023.100978. Epub 2023 Apr 25.

本文引用的文献

1
Multilevel modeling and practice-based research.
Ann Fam Med. 2005 May-Jun;3 Suppl 1(Suppl 1):S52-60. doi: 10.1370/afm.340.
2
Using multilevel models to analyze couple and family treatment data: basic and advanced issues.
J Fam Psychol. 2005 Mar;19(1):98-110. doi: 10.1037/0893-3200.19.1.98.
3
Statistical and substantive inferences in public health: issues in the application of multilevel models.
Annu Rev Public Health. 2004;25:53-77. doi: 10.1146/annurev.publhealth.25.050503.153925.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验