Hook E B, Regal R R
School of Public Health, University of California, Berkeley 94720.
Am J Epidemiol. 1993 May 15;137(10):1148-66. doi: 10.1093/oxfordjournals.aje.a116618.
Capture-recapture methods in epidemiology analyze data from overlapping lists of cases from various sources of ascertainment to generate estimates of missing cases and the total affected. Applications of these methods usually recognize the possibility of, and attempt to adjust for, nonindependent ascertainment by the various sources used. However, separate from the issue of dependencies between sources is the complexity of within source variation in probability of ascertainment of cases, e.g., variation in ascertainment by population subgroups, such as socioeconomic classes, races, or other subdivisions. The authors present a general approach to this issue for the two-source case that takes account of not only biases that arise from such "variable catchability" within sources but also the separate complexity of dependencies between sources. A general formula, (K - delta)/(K + delta), is derived that allows simultaneous calculation of the effects of variable catchability, delta, and source dependencies, delta, upon the accuracy of the two-source estimate. The effect of variable catchability upon accuracy and applications to data by race on the neurodegenerative disorder, Huntington's disease, are presented. In the latter analysis, multiple different two-source estimates of prevalence were made, considering each source versus all others pooled. Most of the likely bias was found to be due to source dependencies; variable catchability contributed relatively little bias. Multiple poolings of all but one source may prove a generally efficient method for overcoming the problem of likely variable catchability, at least when there are data from many distinct sources.
流行病学中的捕获-再捕获方法分析来自不同确诊来源的重叠病例列表中的数据,以生成缺失病例和总受影响人数的估计值。这些方法的应用通常认识到不同来源的非独立确诊可能性,并试图对此进行调整。然而,除了来源之间的依赖性问题之外,病例确诊概率在来源内部的变化也很复杂,例如,按社会经济阶层、种族或其他细分等人群亚组的确诊差异。作者针对两源情况提出了一种解决此问题的通用方法,该方法不仅考虑了来源内部这种“可变捕获率”产生的偏差,还考虑了来源之间依赖性的单独复杂性。推导出了一个通用公式(K - δ)/(K + δ),该公式可以同时计算可变捕获率δ和来源依赖性δ对两源估计准确性的影响。本文展示了可变捕获率对准确性的影响以及将其应用于神经退行性疾病亨廷顿舞蹈症按种族划分的数据。在后者的分析中,考虑了每个来源与所有其他来源合并的情况,对患病率进行了多种不同的两源估计。发现大部分可能的偏差是由于来源依赖性;可变捕获率造成的偏差相对较小。除一个来源外,对所有其他来源进行多次合并可能是克服可能的可变捕获率问题的一种普遍有效的方法,至少在有来自许多不同来源的数据时是这样。