Suppr超能文献

对过多零值进行计数数据分析:零膨胀模型中类别预测的必要性,以及针对龋齿数据在零膨胀模型和通用混合模型之间选择时的数据生成问题。

Modelling count data with excessive zeros: the need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data.

机构信息

Division of Biostatistics, Centre for Epidemiology and Biostatistics, University of Leeds, Clarendon Way, Leeds, UK.

出版信息

Stat Med. 2009 Dec 10;28(28):3539-53. doi: 10.1002/sim.3699.

Abstract

Count data may possess an 'excess' of zeros relative to standard distributions. Zero-inflated Poisson (ZiP) or binomial (ZiB) and generic mixture models have been proposed to deal with such data. We consider biomedical count data with an excess number of zeros and seek to address the following: (i) do zero-inflated models need covariates in the distribution part to predict class membership; (ii) what model-fit criteria have clinical relevance to predicted counts; (iii) can very different model parameterizations have near-identical fit; and (iv) how could model selection and hence model interpretation be aided by considering data generation processes? We show that covariates in the distribution part of zero-inflated models are needed to predict class membership. A range of model-fit criteria should be considered, as consensus is rarely achieved, and considering predicted outcomes may be just as valuable as likelihood-based criteria. Zero-inflated and generic mixture models may be indistinguishable according to both likelihood-based model-fit criteria and predicted outcomes, in which case model differentiation, hence, model selection and interpretation, might be guided by the consideration of a priori data generation processes. Zero-inflated models reflect whether or not there are (or have been) risk differences in disease onset and disease progression, while generic mixture models identify sub-types of individuals with similar risks of disease onset and progression. One or both modelling strategies may be used, though a priori knowledge or clinical impression of data generation might help to distinguish between two or more parameterizations that exhibit similar fit and yield near-identical predicted counts.

摘要

计数数据相对于标准分布可能存在“过多”的零。为了解决这种数据,已经提出了零膨胀泊松(ZiP)或二项式(ZiB)和通用混合模型。我们考虑具有过多零的生物医学计数数据,并寻求解决以下问题:(i)零膨胀模型是否需要分布部分的协变量来预测类别成员;(ii)哪些模型拟合标准对预测计数具有临床相关性;(iii)非常不同的模型参数化是否可以具有几乎相同的拟合;(iv)通过考虑数据生成过程,如何帮助选择模型和因此解释模型?我们表明,零膨胀模型分布部分的协变量对于预测类别成员是必要的。应该考虑一系列模型拟合标准,因为很少达成共识,并且考虑预测结果可能与基于似然的标准一样有价值。根据基于似然的模型拟合标准和预测结果,零膨胀和通用混合模型可能无法区分,在这种情况下,模型区分,因此,模型选择和解释,可以通过考虑先验数据生成过程来指导。零膨胀模型反映了疾病发病和疾病进展是否存在(或曾经存在)风险差异,而通用混合模型则确定了具有相似疾病发病和进展风险的个体的亚类型。可以使用一种或两种建模策略,但对数据生成的先验知识或临床印象可能有助于区分具有相似拟合和产生几乎相同预测计数的两个或更多参数化。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验