Song Hae-Ryoung, Lawson Andrew, D'Agostino Ralph B, Liese Angela D
Department of Epidemiology and Biostatistics and Center for Research in Nutrition and Health Disparities, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA.
Spat Spatiotemporal Epidemiol. 2011 Mar;2(1):23-33. doi: 10.1016/j.sste.2010.09.008.
Sparse count data violate assumptions of traditional Poisson models due to the excessive amount of zeros, and modeling sparse data becomes challenging. However, since aggregation to reduce sparseness may result in biased estimates of risk, solutions need to be found at the level of disaggregated data. We investigated different statistical approaches within a Bayesian hierarchical framework for modeling sparse data without aggregation of data. We compared our proposed models with the traditional Poisson model and the zero-inflated model based on simulated data. We applied statistical models to type 1 and type 2 diabetes in youth 10-19 years known as rare diseases, and compared models using the inference results and various model diagnostic tools. We showed that one of the models we proposed, a sparse Poisson convolution model, performed better than other models in the simulation and application based on the deviance information criterion (DIC) and the mean squared prediction error.
稀疏计数数据由于存在大量零值而违反了传统泊松模型的假设,对稀疏数据进行建模具有挑战性。然而,由于为减少稀疏性而进行的聚合可能导致风险估计有偏差,因此需要在未聚合数据层面找到解决方案。我们在贝叶斯分层框架内研究了不同的统计方法,用于在不聚合数据的情况下对稀疏数据进行建模。我们基于模拟数据将我们提出的模型与传统泊松模型和零膨胀模型进行了比较。我们将统计模型应用于10 - 19岁青少年中的1型和2型糖尿病这两种罕见疾病,并使用推断结果和各种模型诊断工具对模型进行了比较。我们表明,我们提出的模型之一,即稀疏泊松卷积模型,在基于偏差信息准则(DIC)和均方预测误差的模拟和应用中,比其他模型表现更好。