• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种新的回归模型,用于处理过离散二项式数据,包括异常值和零过多的情况。

A new regression model for overdispersed binomial data accounting for outliers and an excess of zeros.

机构信息

Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy.

出版信息

Stat Med. 2021 Jul 30;40(17):3895-3914. doi: 10.1002/sim.9005. Epub 2021 May 7.

DOI:10.1002/sim.9005
PMID:33960503
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8360060/
Abstract

Binary outcomes are extremely common in biomedical research. Despite its popularity, binomial regression often fails to model this kind of data accurately due to the overdispersion problem. Many alternatives can be found in the literature, the beta-binomial (BB) regression model being one of the most popular. The additional parameter of this model enables a better fit to overdispersed data. It also exhibits an attractive interpretation in terms of the intraclass correlation coefficient. Nonetheless, in many real data applications, a single additional parameter cannot handle the entire excess of variability. In this study, we propose a new finite mixture distribution with BB components, namely, the flexible beta-binomial (FBB), which is characterized by a richer parameterization. This allows us to enhance the variance structure to account for multiple causes of overdispersion while also preserving the intraclass correlation interpretation. The novel regression model, based on the FBB distribution, exploits the flexibility and large variety of the distribution's possible shapes (which includes bimodality and various tail behaviors). Thus, it succeeds in accounting for several (possibly concomitant) sources of overdispersion stemming from the presence of latent groups in the population, outliers, and excessive zero observations. Adopting a Bayesian approach to inference, we perform an intensive simulation study that shows the superiority of the new regression model over that of the existing ones. Its better performance is also confirmed by three applications to real datasets extensively studied in the biomedical literature, namely, bacteria data, atomic bomb radiation data, and control mice data.

摘要

二项结果在生物医学研究中极为常见。尽管二项式回归很流行,但由于过度离散问题,它往往无法准确地对这类数据进行建模。文献中有许多替代方法,其中最受欢迎的是二项-贝塔(BB)回归模型。该模型的附加参数使模型更适合过度离散的数据。此外,该模型在类内相关系数方面具有吸引人的解释。然而,在许多实际数据应用中,单个附加参数无法处理整个变异的过度。在本研究中,我们提出了一种新的具有 BB 成分的有限混合分布,即灵活的二项-贝塔(FBB)分布,其特点是参数化更丰富。这使我们能够增强方差结构,以解释过度离散的多种原因,同时保留类内相关的解释。基于 FBB 分布的新回归模型利用了分布可能形状的灵活性和多样性(包括双峰和各种尾部行为)。因此,它成功地解释了由于人群中潜在组的存在、异常值和过多的零观察值而导致的几种(可能同时存在)过度离散源。采用贝叶斯推理方法,我们进行了一项密集的模拟研究,结果表明新回归模型优于现有模型。通过对生物医学文献中广泛研究的三个真实数据集(细菌数据、原子弹辐射数据和对照小鼠数据)的应用,也证实了新回归模型的更好性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/08883cde3068/SIM-40-3895-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/9c838a77f3bc/SIM-40-3895-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/fe7a73bf6b5f/SIM-40-3895-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/09cd2d6e3352/SIM-40-3895-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/8aa9ee38eeaa/SIM-40-3895-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/d23d65ae5395/SIM-40-3895-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/f0f18b8025f9/SIM-40-3895-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/45b894c1f897/SIM-40-3895-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/d714d8964cff/SIM-40-3895-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/08883cde3068/SIM-40-3895-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/9c838a77f3bc/SIM-40-3895-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/fe7a73bf6b5f/SIM-40-3895-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/09cd2d6e3352/SIM-40-3895-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/8aa9ee38eeaa/SIM-40-3895-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/d23d65ae5395/SIM-40-3895-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/f0f18b8025f9/SIM-40-3895-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/45b894c1f897/SIM-40-3895-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/d714d8964cff/SIM-40-3895-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/08883cde3068/SIM-40-3895-g005.jpg

相似文献

1
A new regression model for overdispersed binomial data accounting for outliers and an excess of zeros.一种新的回归模型,用于处理过离散二项式数据,包括异常值和零过多的情况。
Stat Med. 2021 Jul 30;40(17):3895-3914. doi: 10.1002/sim.9005. Epub 2021 May 7.
2
A comparison of statistical methods for modeling count data with an application to hospital length of stay.一种用于对计数数据建模的统计方法比较及其在住院时间中的应用。
BMC Med Res Methodol. 2022 Aug 4;22(1):211. doi: 10.1186/s12874-022-01685-8.
3
On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses.关于零膨胀和过度分散计数响应的参数模型和非参数模型的性能。
Stat Med. 2015 Oct 30;34(24):3235-45. doi: 10.1002/sim.6560. Epub 2015 Jun 15.
4
Marginalized multilevel hurdle and zero-inflated models for overdispersed and correlated count data with excess zeros.用于具有过多零值的过度分散和相关计数数据的边缘化多级障碍模型和零膨胀模型。
Stat Med. 2014 Nov 10;33(25):4402-19. doi: 10.1002/sim.6237. Epub 2014 Jun 23.
5
Marginalized zero-inflated negative binomial regression with application to dental caries.边缘化零膨胀负二项回归及其在龋齿研究中的应用
Stat Med. 2016 May 10;35(10):1722-35. doi: 10.1002/sim.6804. Epub 2015 Nov 15.
6
Semiparametric models for multilevel overdispersed count data with extra zeros.具有额外零值的多层次过离散计数数据的半参数模型。
Stat Methods Med Res. 2018 Apr;27(4):1187-1201. doi: 10.1177/0962280216657376. Epub 2016 Jul 7.
7
Approaches for dealing with various sources of overdispersion in modeling count data: Scale adjustment versus modeling.处理计数数据建模中各种过度分散来源的方法:尺度调整与建模。
Stat Methods Med Res. 2017 Aug;26(4):1802-1823. doi: 10.1177/0962280215588569. Epub 2015 May 31.
8
A robust regression model for bounded count health data.用于有界计数健康数据的稳健回归模型。
Stat Methods Med Res. 2024 Aug;33(8):1392-1411. doi: 10.1177/09622802241259178. Epub 2024 Jun 7.
9
Disease mapping of zero-excessive mesothelioma data in Flanders.比利时弗拉芒地区零超额间皮瘤数据的疾病地图绘制。
Ann Epidemiol. 2017 Jan;27(1):59-66.e3. doi: 10.1016/j.annepidem.2016.10.006. Epub 2016 Nov 1.
10
A score test for overdispersion in zero-inflated poisson mixed regression model.零膨胀泊松混合回归模型中过度离散的得分检验。
Stat Med. 2007 Mar 30;26(7):1608-22. doi: 10.1002/sim.2616.

引用本文的文献

1
A New Dirichlet-Multinomial Mixture Regression Model for the Analysis of Microbiome Data.一种用于微生物组数据分析的新型狄利克雷-多项混合回归模型。
Stat Med. 2025 Aug;44(18-19):e70220. doi: 10.1002/sim.70220.
2
A Lindley-binomial model for analyzing the proportions with sparseness and excessive zeros.一种用于分析具有稀疏性和过多零值比例的林德利二项式模型。
J Appl Stat. 2023 Jul 22;51(9):1792-1817. doi: 10.1080/02664763.2023.2237212. eCollection 2024.

本文引用的文献

1
Stan: A Probabilistic Programming Language.斯坦:一种概率编程语言。
J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.
2
A new mixed-effects mixture model for constrained longitudinal data.一种新的约束性纵向数据混合效应混合模型。
Stat Med. 2020 Jan 30;39(2):129-145. doi: 10.1002/sim.8406. Epub 2019 Nov 21.
3
Prediction intervals for overdispersed binomial data with application to historical controls.具有应用于历史对照的过离散二项数据的预测区间。
Stat Med. 2019 Jun 30;38(14):2652-2663. doi: 10.1002/sim.8124. Epub 2019 Mar 5.
4
Estimation for zero-inflated beta-binomial regression model with missing response data.带有缺失响应数据的零膨胀 Beta-二项式回归模型的估计。
Stat Med. 2018 Nov 20;37(26):3789-3813. doi: 10.1002/sim.7845. Epub 2018 Jun 10.
5
Comparison of beta-binomial regression model approaches to analyze health-related quality of life data.比较分析健康相关生活质量数据的贝塔二项式回归模型方法。
Stat Methods Med Res. 2018 Oct;27(10):2989-3009. doi: 10.1177/0962280217690413. Epub 2017 Feb 13.
6
VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS.用于稀疏狄利克雷-多项回归的变量选择及其在微生物组数据分析中的应用
Ann Appl Stat. 2013 Mar 1;7(1). doi: 10.1214/12-AOAS592.
7
A spatial beta-binomial model for clustered count data on dental caries.针对龋齿的聚类计数数据的空间 Beta-Binomial 模型。
Stat Methods Med Res. 2011 Apr;20(2):85-102. doi: 10.1177/0962280210372453. Epub 2010 May 28.
8
Bias-corrected maximum likelihood estimator of the intraclass correlation parameter for binary data.二元数据类内相关参数的偏差校正最大似然估计器。
Stat Med. 2005 Nov 30;24(22):3497-512. doi: 10.1002/sim.2197.
9
Confidence intervals for the risk ratio under cluster sampling based on the beta-binomial model.基于贝塔二项式模型的整群抽样下风险比的置信区间。
Stat Med. 2000 Nov 15;19(21):2933-42. doi: 10.1002/1097-0258(20001115)19:21<2933::aid-sim591>3.0.co;2-q.
10
On a likelihood-based goodness-of-fit test of the beta-binomial model.关于β-二项式模型的基于似然的拟合优度检验。
Biometrics. 2000 Sep;56(3):947-50. doi: 10.1111/j.0006-341x.2000.947_1.x.