Böhning Dankmar, Rocchetti Irene, Alfó Marco, Holling Heinz
Department of Mathematical Sciences and Southampton Statistical Sciences Research Institute, University of Southampton, Highfield, Southampton, SO17 1BJ, UK.
Istituto Nazionale di Statistica, Rome, Italy.
Biometrics. 2016 Sep;72(3):697-706. doi: 10.1111/biom.12485. Epub 2016 Feb 10.
Capture-recapture methods are used to estimate the size of a population of interest which is only partially observed. In such studies, each member of the population carries a count of the number of times it has been identified during the observational period. In real-life applications, only positive counts are recorded, and we get a truncated at zero-observed distribution. We need to use the truncated count distribution to estimate the number of unobserved units. We consider ratios of neighboring count probabilities, estimated by ratios of observed frequencies, regardless of whether we have a zero-truncated or an untruncated distribution. Rocchetti et al. (2011) have shown that, for densities in the Katz family, these ratios can be modeled by a regression approach, and Rocchetti et al. (2014) have specialized the approach to the beta-binomial distribution. Once the regression model has been estimated, the unobserved frequency of zero counts can be simply derived. The guiding principle is that it is often easier to find an appropriate regression model than a proper model for the count distribution. However, a full analysis of the connection between the regression model and the associated count distribution has been missing. In this manuscript, we fill the gap and show that the regression model approach leads, under general conditions, to a valid count distribution; we also consider a wider class of regression models, based on fractional polynomials. The proposed approach is illustrated by analyzing various empirical applications, and by means of a simulation study.
捕获再捕获方法用于估计仅部分可观测的目标总体的规模。在这类研究中,总体中的每个成员都带有一个在观测期内被识别次数的计数。在实际应用中,只记录正计数,我们得到的是一个在零处截断的观测分布。我们需要使用截断计数分布来估计未观测到的单元数量。我们考虑相邻计数概率的比率,通过观测频率的比率来估计,无论我们拥有的是零截断分布还是未截断分布。罗凯蒂等人(2011年)表明,对于卡茨族中的密度,这些比率可以通过回归方法建模,并且罗凯蒂等人(2014年)将该方法专门应用于贝塔二项分布。一旦估计出回归模型,零计数的未观测频率就可以简单推导出来。指导原则是,找到一个合适的回归模型通常比找到一个合适的计数分布模型更容易。然而,对于回归模型与相关计数分布之间的联系,尚未进行全面分析。在本手稿中,我们填补了这一空白,并表明在一般条件下,回归模型方法会导致一个有效的计数分布;我们还考虑了基于分数多项式的更广泛的回归模型类。通过分析各种实证应用以及进行模拟研究,对所提出的方法进行了说明。