一种用于分析过度分散和零修正计数数据的新回归模型。

A New Regression Model for the Analysis of Overdispersed and Zero-Modified Count Data.

作者信息

Bertoli Wesley, Conceição Katiane S, Andrade Marinho G, Louzada Francisco

机构信息

Department of Statistics, Federal University of Technology, Paraná, Av. Sete de Setembro, 3165 Rebouças, Curitiba 80230-901, PR, Brazil.

Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, Av. Trab. São Carlense, 400 Parque Arnold Schimidt, São Carlos 13566-590, SP, Brazil.

出版信息

Entropy (Basel). 2021 May 21;23(6):646. doi: 10.3390/e23060646.

DOI:10.3390/e23060646

PMID:34064281

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8224290/

Abstract

Count datasets are traditionally analyzed using the ordinary Poisson distribution. However, said model has its applicability limited, as it can be somewhat restrictive to handling specific data structures. In this case, the need arises for obtaining alternative models that accommodate, for example, overdispersion and zero modification (inflation/deflation at the frequency of zeros). In practical terms, these are the most prevalent structures ruling the nature of discrete phenomena nowadays. Hence, this paper's primary goal was to jointly address these issues by deriving a fixed-effects regression model based on the hurdle version of the Poisson-Sujatha distribution. In this framework, the zero modification is incorporated by considering that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the g-prior method. Intensive Monte Carlo simulation studies were performed to assess the Bayesian estimators' empirical properties, and the obtained results have been discussed. The proposed model was considered for analyzing a real dataset, and its competitiveness regarding some well-established fixed-effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian -value and the randomized quantile residuals were considered for the task of model validation.

摘要

传统上，计数数据集使用普通泊松分布进行分析。然而，该模型的适用性有限，因为它在处理特定数据结构时可能具有一定的局限性。在这种情况下，就需要获得替代模型，例如能够处理过度分散和零修正（零频率处的膨胀/收缩）的模型。实际上，这些是当今支配离散现象本质的最普遍结构。因此，本文的主要目标是通过推导基于泊松 - 苏贾塔分布障碍版本的固定效应回归模型来共同解决这些问题。在此框架下，通过考虑二元概率模型确定哪些结果为零值，并由零截断过程负责生成正观测值来纳入零修正。模型参数的后验推断是通过基于g先验方法的全贝叶斯方法获得的。进行了密集的蒙特卡罗模拟研究以评估贝叶斯估计量的经验性质，并对所得结果进行了讨论。考虑使用所提出的模型分析一个真实数据集，并评估其相对于一些成熟的计数数据固定效应模型的竞争力。基于标准散度度量进行了敏感性分析，以检测可能影响参数估计的观测值。考虑使用贝叶斯p值和随机分位数残差进行模型验证任务。