Professor of Medicine and Epidemiology & Community Medicine, University of Ottawa; Senior Scientist, Ottawa Hospital Research Institute; Scientist, Institute for Clinical Evaluative Sciences.
J Clin Epidemiol. 2018 Apr;96:93-100. doi: 10.1016/j.jclinepi.2017.12.012. Epub 2017 Dec 26.
Misclassification bias can result from the incorrect assignment of disease status using inaccurate diagnostic codes in health administrative data. This study quantified misclassification bias in the study of Colles' fracture.
Colles' fracture status was determined in all patients >50 years old seen in the emergency room at a single teaching hospital between 2006 and 2014 by manually reviewing all forearm radiographs. This data set was linked to population-based data capturing all emergency room visits. Reference disease prevalence and its association with covariates were measured. A multivariate model using covariates derived from administrative data was used to impute Colles' fracture status and measure its prevalence and associations using bootstrapping methods. These values were compared with reference values to measure misclassification bias. This was repeated using diagnostic codes to determine Colles' fracture status.
Five hundred eighteen thousand, seven hundred forty-four emergency visits were included with 3,538 (0.7%) having a Colles' fracture. Determining disease status using the diagnostic code (sensitivity 69.4%, positive predictive value 79.9%) resulted in significant underestimate of Colles' fracture prevalence (relative difference -13.3%) and biased associations with covariates. The Colles' fracture model accurately determined disease probability (c-statistic 98.9 [95% confidence interval {CI} 98.7-99.1], calibration slope 1.009 [95% CI 1.004-1.013], Nagelkerke's R 0.71 [95% CI 0.70-0.72]). Using disease probability estimates from this model, bootstrap imputation (BI) resulted in minimal misclassification bias (relative difference in disease prevalence -0.01%). The statistical significance of the association between Colles' fracture and age was accurate in 32.4% and 70.4% of samples when using the code or BI, respectively.
Misclassification bias in estimating disease prevalence and its associations can be minimized with BI using accurate disease probability estimates.
在健康管理数据中,使用不准确的诊断代码来错误地分配疾病状态,可能会导致分类偏倚。本研究定量评估了 Colles 骨折研究中的分类偏倚。
在 2006 年至 2014 年间,通过手动查看所有前臂 X 光片,确定了在单所教学医院急诊室就诊的所有 50 岁以上患者的 Colles 骨折状态。该数据集与捕获所有急诊就诊的基于人群的数据相关联。测量了参考疾病患病率及其与协变量的关联。使用来自管理数据的协变量的多元模型来推断 Colles 骨折状态,并使用自举方法测量其患病率和关联。将这些值与参考值进行比较,以衡量分类偏倚。使用诊断代码来确定 Colles 骨折状态,重复此操作。
共纳入 518744 次急诊就诊,其中 3538 例(0.7%)患有 Colles 骨折。使用诊断代码确定疾病状态(敏感度 69.4%,阳性预测值 79.9%)导致 Colles 骨折患病率的显著低估(相对差异-13.3%),并且与协变量的关联存在偏差。Colles 骨折模型准确地确定了疾病概率(C 统计量 98.9 [95%置信区间 {CI} 98.7-99.1],校准斜率 1.009 [95% CI 1.004-1.013],Nagelkerke 的 R 0.71 [95% CI 0.70-0.72])。使用该模型的疾病概率估计值,自举插补(BI)导致疾病患病率的相对差异最小(0.01%)。使用代码或 BI 时,Colles 骨折与年龄之间的关联的统计学意义分别在 32.4%和 70.4%的样本中是准确的。
使用准确的疾病概率估计值进行 BI 可以最大程度地减少估计疾病患病率及其关联的分类偏倚。