Böhning Dankmar, Kuhnert Ronny
School of Applied Statistics, School of Biological Sciences, University of Reading, Reading RG6 6FN, UK.
Biometrics. 2006 Dec;62(4):1207-15. doi: 10.1111/j.1541-0420.2006.00565.x.
This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.
本文围绕零截断计数数据建模展开。考虑了一个参数化计数密度族。该族密度的截断混合与同一族截断密度的混合有所不同。前者模型在表述和解释上更为自然,而后者模型在理论处理上更容易。结果表明,对于任何导致截断混合的混合分布,都能找到一个(通常不同的)混合分布,使得相关的截断密度混合等于截断混合,反之亦然。这意味着两种情况下的似然曲面是一致的,从这个意义上说,两个模型是等价的。零截断计数数据模型在捕获再捕获设定中经常用于估计种群大小,并且可以证明,与这两个模型相关的两个霍维茨 - 汤普森估计量是一致的。特别地,对于截断泊松密度的混合可以得到很强的结果,包括可靠地全局构建混合分布的唯一非参数最大似然估计量(NPMLE),这意味着对种群大小有唯一的估计量。这些结果的好处在于,处理截断计数密度的混合是有效的,这对从业者来说吸引力较小,但在理论上更容易。截断计数密度的混合构成一个凸线性模型,对此已有完善的理论,包括全局最大似然理论以及算法方法。一旦在这个类别中解决了问题,通过一个明确给出的映射可能很容易将其转换回原始问题。给出了这些思想的应用,特别是在截断泊松族的情况下。