Suppr超能文献

一种新的回归模型,用于处理过离散二项式数据,包括异常值和零过多的情况。

A new regression model for overdispersed binomial data accounting for outliers and an excess of zeros.

机构信息

Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy.

出版信息

Stat Med. 2021 Jul 30;40(17):3895-3914. doi: 10.1002/sim.9005. Epub 2021 May 7.

Abstract

Binary outcomes are extremely common in biomedical research. Despite its popularity, binomial regression often fails to model this kind of data accurately due to the overdispersion problem. Many alternatives can be found in the literature, the beta-binomial (BB) regression model being one of the most popular. The additional parameter of this model enables a better fit to overdispersed data. It also exhibits an attractive interpretation in terms of the intraclass correlation coefficient. Nonetheless, in many real data applications, a single additional parameter cannot handle the entire excess of variability. In this study, we propose a new finite mixture distribution with BB components, namely, the flexible beta-binomial (FBB), which is characterized by a richer parameterization. This allows us to enhance the variance structure to account for multiple causes of overdispersion while also preserving the intraclass correlation interpretation. The novel regression model, based on the FBB distribution, exploits the flexibility and large variety of the distribution's possible shapes (which includes bimodality and various tail behaviors). Thus, it succeeds in accounting for several (possibly concomitant) sources of overdispersion stemming from the presence of latent groups in the population, outliers, and excessive zero observations. Adopting a Bayesian approach to inference, we perform an intensive simulation study that shows the superiority of the new regression model over that of the existing ones. Its better performance is also confirmed by three applications to real datasets extensively studied in the biomedical literature, namely, bacteria data, atomic bomb radiation data, and control mice data.

摘要

二项结果在生物医学研究中极为常见。尽管二项式回归很流行,但由于过度离散问题,它往往无法准确地对这类数据进行建模。文献中有许多替代方法,其中最受欢迎的是二项-贝塔(BB)回归模型。该模型的附加参数使模型更适合过度离散的数据。此外,该模型在类内相关系数方面具有吸引人的解释。然而,在许多实际数据应用中,单个附加参数无法处理整个变异的过度。在本研究中,我们提出了一种新的具有 BB 成分的有限混合分布,即灵活的二项-贝塔(FBB)分布,其特点是参数化更丰富。这使我们能够增强方差结构,以解释过度离散的多种原因,同时保留类内相关的解释。基于 FBB 分布的新回归模型利用了分布可能形状的灵活性和多样性(包括双峰和各种尾部行为)。因此,它成功地解释了由于人群中潜在组的存在、异常值和过多的零观察值而导致的几种(可能同时存在)过度离散源。采用贝叶斯推理方法,我们进行了一项密集的模拟研究,结果表明新回归模型优于现有模型。通过对生物医学文献中广泛研究的三个真实数据集(细菌数据、原子弹辐射数据和对照小鼠数据)的应用,也证实了新回归模型的更好性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d351/8360060/9c838a77f3bc/SIM-40-3895-g007.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验