用于多重比较的聚类和过度分散计数数据建模的比较研究

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons.

作者信息

Kruppa Jochen, Hothorn Ludwig

机构信息

Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Berlin, Germany.

Berlin Institute of Health (BIH), Berlin, Germany.

出版信息

J Appl Stat. 2020 Jul 3;48(16):3220-3232. doi: 10.1080/02664763.2020.1788518. eCollection 2021.

DOI:10.1080/02664763.2020.1788518

PMID:35707260

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9042126/

Abstract

Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered - e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

摘要

在各个科学领域收集的数据都是计数数据。分析此类数据的一种方法是使用多重比较来比较因素处理的各个水平。然而，所测量的个体通常是聚类的——例如根据窝或饲养情况。在通过重复测量模型估计参数时必须考虑这一点。此外，忽略计数数据容易出现的过度离散会导致一类错误率增加。我们使用几种不同的数据设置进行模拟研究，并将不同的多重对比检验与广义估计方程和广义线性混合模型的参数估计进行比较，以观察覆盖率和拒绝概率。我们生成了在许多生物学环境中都能观察到的小样本中过度离散、聚类的计数数据。我们发现，如果正确指定方差-三明治估计量，广义估计方程的表现优于广义线性混合模型。此外，广义线性混合模型在某些数据设置下显示出收敛速度问题，但存在影响较小的模型实现。最后，我们使用一个遗传数据的例子来演示多重对比检验的应用以及忽略强过度离散的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4218/9042126/e19e3879cc60/CJAS_A_1788518_F0001_OC.jpg

相似文献

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons.用于多重比较的聚类和过度分散计数数据建模的比较研究

J Appl Stat. 2020 Jul 3;48(16):3220-3232. doi: 10.1080/02664763.2020.1788518. eCollection 2021.

Concordance correlation coefficients estimated by modified variance components and generalized estimating equations for longitudinal overdispersed Poisson data.采用修正方差分量和广义估计方程估计纵向过离散泊松数据的一致性相关系数。

Stat Methods Med Res. 2022 Feb;31(2):267-286. doi: 10.1177/09622802211065156. Epub 2021 Dec 20.

Comparative assessment of parameter estimation methods in the presence of overdispersion: a simulation study.存在过度离散情况下参数估计方法的比较评估：一项模拟研究

Math Biosci Eng. 2019 May 16;16(5):4299-4313. doi: 10.3934/mbe.2019214.

Estimation of capture probabilities using generalized estimating equations and mixed effects approaches.使用广义估计方程和混合效应方法估计捕获概率。

Ecol Evol. 2014 Apr;4(7):1158-65. doi: 10.1002/ece3.1000. Epub 2014 Mar 10.

Performance of analytical methods for overdispersed counts in cluster randomized trials: sample size, degree of clustering and imbalance.在整群随机试验中分析过度离散计数的方法的性能：样本量、聚类程度和不均衡性。

Stat Med. 2009 Oct 30;28(24):2989-3011. doi: 10.1002/sim.3681.

High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis.高维过离散广义因子模型及其在单细胞测序数据分析中的应用。

Stat Med. 2024 Nov 10;43(25):4836-4849. doi: 10.1002/sim.10213. Epub 2024 Sep 5.

Longitudinal beta-binomial modeling using GEE for overdispersed binomial data.使用广义估计方程（GEE）对过度分散的二项式数据进行纵向β-二项式建模。

Stat Med. 2017 Mar 15;36(6):1029-1040. doi: 10.1002/sim.7191. Epub 2016 Dec 5.

Longitudinal method comparison: modeling polygenic risk for post-traumatic stress disorder over time in individuals of African and European ancestry.纵向方法比较：对非洲和欧洲血统个体创伤后应激障碍的多基因风险随时间建模。

Front Genet. 2024 May 16;15:1203577. doi: 10.3389/fgene.2024.1203577. eCollection 2024.

Approaches for dealing with various sources of overdispersion in modeling count data: Scale adjustment versus modeling.处理计数数据建模中各种过度分散来源的方法：尺度调整与建模。

Stat Methods Med Res. 2017 Aug;26(4):1802-1823. doi: 10.1177/0962280215588569. Epub 2015 May 31.

A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE-type marginal model for binary outcomes.在群组随机试验和拟合二项结局的 GEE 型边缘模型的背景下，一种现成的改进方法，可以用于估计群组内相关性。

Clin Trials. 2019 Feb;16(1):41-51. doi: 10.1177/1740774518803635. Epub 2018 Oct 8.

引用本文的文献

Responses of grassland soil mesofauna to induced climate change.草原土壤中型土壤动物对人工诱导气候变化的响应。

Sci Rep. 2025 May 13;15(1):16532. doi: 10.1038/s41598-025-01445-w.

Comparative Analysis of mRNA and lncRNA Expression Profiles in Testicular Tissue of Sexually Immature and Sexually Mature Mongolian Horses.性未成熟和性成熟蒙古马睾丸组织中mRNA和lncRNA表达谱的比较分析

Animals (Basel). 2024 Jun 7;14(12):1717. doi: 10.3390/ani14121717.

The association between social vulnerability and geriatric assessment impairments among older adults with gastrointestinal cancers-The CARE Registry.社会脆弱性与胃肠道癌症老年患者老年综合评估受损之间的关联：CARE 登记研究。

Cancer. 2024 Sep 15;130(18):3188-3197. doi: 10.1002/cncr.35390. Epub 2024 Jun 2.

Single-Cell Transcriptome Sequencing Reveals Molecular Expression Differences and Marker Genes in Testes during the Sexual Maturation of Mongolian Horses.单细胞转录组测序揭示蒙古马性成熟过程中睾丸的分子表达差异及标记基因

Animals (Basel). 2024 Apr 23;14(9):1258. doi: 10.3390/ani14091258.

Have restrictions on human mobility impacted suicide rates during the COVID-19 pandemic in Japan?在新冠疫情期间，日本对人员流动的限制是否对自杀率产生了影响？

Psychiatry Res. 2022 Nov;317:114898. doi: 10.1016/j.psychres.2022.114898. Epub 2022 Oct 9.

Change point detection for clustered expression data.基于聚类表达数据的变化点检测。

BMC Genomics. 2022 Jul 6;23(1):491. doi: 10.1186/s12864-022-08680-9.

本文引用的文献

Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples.小样本纵向分析中广义估计方程（GEE）的协方差估计量

Stat Med. 2016 May 10;35(10):1706-21. doi: 10.1002/sim.6817. Epub 2015 Nov 19.

A non-parametric model to address overdispersed count response in a longitudinal data setting with missingness.一种用于处理具有缺失值的纵向数据设置中过度分散计数响应的非参数模型。

Stat Methods Med Res. 2017 Jun;26(3):1461-1475. doi: 10.1177/0962280215583397. Epub 2015 May 5.

Modeling and simulation of count data.计数数据的建模与模拟。

CPT Pharmacometrics Syst Pharmacol. 2014 Aug 13;3(8):e129. doi: 10.1038/psp.2014.27.

Empirical bayesian selection of hypothesis testing procedures for analysis of sequence count expression data.用于序列计数表达数据分析的假设检验程序的经验贝叶斯选择

Stat Appl Genet Mol Biol. 2012 Oct 19;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1773/1544-6115.1773.xml. doi: 10.1515/1544-6115.1773.

Modified robust variance estimator for generalized estimating equations with improved small-sample performance.广义估计方程的改进小样本性能的修正稳健方差估计量。

Stat Med. 2011 May 20;30(11):1278-91. doi: 10.1002/sim.4150. Epub 2010 Dec 29.

Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists.重复性分析：高斯分布与非高斯分布数据的实用指南（适用于生物学家）

Biol Rev Camb Philos Soc. 2010 Nov;85(4):935-56. doi: 10.1111/j.1469-185X.2010.00141.x.

Simultaneous inference in general parametric models.一般参数模型中的同时推断。

Biom J. 2008 Jun;50(3):346-63. doi: 10.1002/bimj.200810425.

Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data?拟泊松回归与负二项回归：我们应如何对过度离散计数数据进行建模？

Ecology. 2007 Nov;88(11):2766-72. doi: 10.1890/07-0043.1.

Small-sample adjustments for Wald-type tests using sandwich estimators.使用三明治估计量对 Wald 型检验进行小样本调整。

Biometrics. 2001 Dec;57(4):1198-206. doi: 10.1111/j.0006-341x.2001.01198.x.

A covariance estimator for GEE with improved small-sample properties.一种具有改进小样本性质的广义估计方程（GEE）协方差估计量。

Biometrics. 2001 Mar;57(1):126-34. doi: 10.1111/j.0006-341x.2001.00126.x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于多重比较的聚类和过度分散计数数据建模的比较研究

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献