Hook E B, Regal R R
School of Public Health, University of California, Berkeley 94720-7360, USA.
Am J Epidemiol. 2000 Oct 15;152(8):771-9. doi: 10.1093/aje/152.8.771.
The authors used "internal validity analysis" to evaluate the performance of various capture-recapture methods. Data from studies with five overlapping, incomplete lists generated subgroups whose known sizes were compared with estimates derived from various four-source capture-recapture analyses. In 15 data sets unanalyzed previously (five subgroups of each of three new studies), the authors observed a trend toward mean underestimation of the known population size by 16-25%. (Coverage of the 90% confidence intervals associated with the method found to be optimal was acceptable (13/15), despite the downward bias.) The authors conjectured that (with the obvious exception of geographically disparate lists) most data sets used by epidemiologists tend to have a net positive dependence; that is, cases captured by one source are more likely to be captured by some other available source than are cases selected randomly from the population, and this trend results in a bias toward underestimation. Attempts to ensure that the underlying assumptions of the methods are met, such as minimizing (or adjusting adequately) for the possibility of loss due to death or migration, as was undertaken in one exceptional study, appear likely to improve the behavior of these methods.
作者使用“内部效度分析”来评估各种捕获-再捕获方法的性能。来自具有五个重叠、不完整列表的研究的数据生成了亚组,其已知规模与来自各种四源捕获-再捕获分析得出的估计值进行了比较。在之前未分析的15个数据集中(三项新研究中每项的五个亚组),作者观察到一种趋势,即已知总体规模平均被低估了16%至25%。(尽管存在向下偏差,但与被发现为最优的方法相关的90%置信区间的覆盖范围是可接受的(15个中有13个)。)作者推测,(明显不包括地理上分散的列表)流行病学家使用的大多数数据集往往具有净正相关性;也就是说,与从总体中随机选择的病例相比,一个来源捕获的病例更有可能被其他可用来源捕获,这种趋势导致了低估偏差。正如在一项特殊研究中所做的那样,试图确保满足这些方法的基本假设,例如将因死亡或迁移导致损失的可能性降至最低(或进行充分调整),似乎有可能改善这些方法的性能。