School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada.
Clin Invest Med. 2022 Jun 26;45(2):E21-27. doi: 10.25011/cim.v45i2.38100.
Disease prevalence estimates from population-based administrative databases are often biased due to measurement (misclassification) errors. The purpose of this article is to review the methodology for estimating disease prevalence in administrative data, with a focus on bias correction.
Several approaches to bias correction in administrative data were reviewed and application of these methods was demonstrated using an example from the literature: physician claims and hospitalization data were employed to estimate diabetes prevalence in Ontario, Canada.
Misclassification bias in prevalence estimates from administrative data can be reduced by developing and selecting an optimal algorithm for case identification, applying a bias correction formula, or using statistical modelling. An algorithm for which sensitivity equals positive predictive value provides an unbiased estimate of prevalence. Bias reduction methods generally require information about the measurement properties of the algorithm, such as sensitivity, specificity, or predictive value. These properties depend on disease type, prevalence, algorithm definition (including the observation window), and may vary by population and time. Prevalence estimates can be improved by applying multivariable disease prediction models.
Frequency of a positive case identification algorithm in administrative data is generally not equivalent to disease prevalence. Although prevalence estimates can be corrected for bias using known measurement properties of the algorithm, these properties may be difficult to estimate accurately; therefore, disease prevalence estimates based on administrative data must be treated with caution.
由于测量(分类错误)误差,基于人群的行政数据库中的疾病患病率估计往往存在偏差。本文的目的是回顾在行政数据中估计疾病患病率的方法,重点是偏差校正。
本文综述了几种在行政数据中进行偏差校正的方法,并通过文献中的一个例子展示了这些方法的应用:使用安大略省的医生索赔和住院数据来估计加拿大的糖尿病患病率。
通过开发和选择用于病例识别的最佳算法、应用偏差校正公式或使用统计建模,可以减少来自行政数据的患病率估计中的分类错误偏差。对于灵敏度等于阳性预测值的算法,可以提供患病率的无偏估计。偏差减少方法通常需要有关算法的测量特性的信息,例如灵敏度、特异性或预测值。这些特性取决于疾病类型、患病率、算法定义(包括观察窗口),并且可能因人群和时间而异。通过应用多变量疾病预测模型,可以改善患病率估计。
行政数据中阳性病例识别算法的频率通常与疾病患病率不相等。虽然可以使用算法的已知测量特性来校正偏倚,但这些特性可能难以准确估计;因此,必须谨慎对待基于行政数据的疾病患病率估计。