Hook E B, Regal R R
School of Public Health, University of California, Berkeley 94720-7360, USA.
Am J Epidemiol. 1997 Jun 15;145(12):1138-44. doi: 10.1093/oxfordjournals.aje.a009077.
In log-linear capture-recapture approaches to population size, the method of model selection may have a major effect upon the estimate. In addition, the estimate may also be very sensitive if certain cells are null or very sparse, even with the use of multiple sources. The authors evaluated 1) various approaches to the issue of model uncertainty and 2) a small sample correction for three or more sources recently proposed by Hook and Regal. The authors compared the estimates derived using 1) three different information criteria that included Akaike's Information Criterion (AIC) and two alternative formulations of the Bayesian Information Criterion (BIC), one proposed by Draper ("two pi") and one by Schwarz ("not two pi"); 2) two related methods of weighting estimates associated with models; 3) the independent model; and 4) the saturated model, with the known totals in 20 different populations studied by five separate groups of investigators. For each method, we also compared the estimate derived with or without the proposed small sample correction. At least in these data sets, the use of AIC appeared on balance to be preferable. The BIC formulation suggested by Draper appeared slightly preferable to that suggested by Schwarz. Adjustment for model uncertainty appears to improve results slightly. The proposed small sample correction appeared to diminish relative log bias but only when sparse cells were present. Otherwise, its use tended to increase relative log bias. Use of the saturated model (with or without the small sample correction) appears to be optimal if the associated interval is not uselessly large, and if one can plausibly exclude an all-source interaction. All other approaches led to an estimate that was too low by about one standard deviation.
在用于估计种群规模的对数线性捕获-再捕获方法中,模型选择方法可能对估计结果产生重大影响。此外,即使使用多个数据源,如果某些单元格为空或非常稀疏,估计结果也可能非常敏感。作者评估了:1)处理模型不确定性问题的各种方法;2)Hook和Regal最近提出的针对三个或更多数据源的小样本校正方法。作者比较了使用以下方法得出的估计值:1)三种不同的信息准则,包括赤池信息准则(AIC)和贝叶斯信息准则(BIC)的两种替代形式,一种由Draper提出(“双π”),另一种由Schwarz提出(“非双π”);2)两种与模型相关的加权估计方法;3)独立模型;4)饱和模型,与五组不同研究人员研究的20个不同种群中的已知总数进行比较。对于每种方法,我们还比较了使用或不使用提议的小样本校正得出的估计值。至少在这些数据集中,总体而言使用AIC似乎更可取。Draper提出的BIC形式似乎比Schwarz提出的稍好。对模型不确定性进行调整似乎能略微改善结果。提议的小样本校正似乎能减少相对对数偏差,但仅在存在稀疏单元格时如此。否则,使用它往往会增加相对对数偏差。如果相关区间不是大到无用,并且如果可以合理排除所有源交互作用,那么使用饱和模型(无论是否进行小样本校正)似乎是最优的。所有其他方法得出的估计值都比实际值低约一个标准差。