Ladouceur Martin, Rahme Elham, Pineau Christian A, Joseph Lawrence
Division of Clinical Epidemiology, Montreal General Hospital, 687 Pine Avenue West, V-Building, Montreal, Quebec H3A 1A1, Canada.
Biometrics. 2007 Mar;63(1):272-9. doi: 10.1111/j.1541-0420.2006.00665.x.
Because primary data collection can be expensive, researchers are increasingly using information collected in medical administrative databases for scientific purposes. This information, however, is typically collected for reasons other than research, and many such databases have been shown to contain substantial proportions of misclassification errors. For example, many administrative databases contain fields for patient diagnostic codes, but these are often missing or inaccurate, in part because physician reimbursement schemes depend on medical acts performed rather than any diagnosis. Errors in ascertaining which individuals have a given disease bias not only prevalence estimates, but also estimates of associations between the disease and other variables, such as medication use. We attempt to estimate the prevalence of osteoarthritis (OA) among elderly Quebeckers using a government administrative database. We compare a naive estimate relying solely on the physician diagnoses of OA listed in the database to estimates from several different Bayesian latent class models which adjust for misclassified physician diagnostic codes via use of other available diagnostic clues. We find that the prevalence estimates vary widely, depending on the model used and assumptions made. We conclude that any inferences from these databases need to be interpreted with great caution, until further work estimating the reliability of database items is carried out.
由于原始数据收集成本高昂,研究人员越来越多地将医学管理数据库中收集的信息用于科学目的。然而,这些信息通常是出于研究以外的原因收集的,并且许多此类数据库已被证明包含大量的错误分类误差。例如,许多管理数据库包含患者诊断代码字段,但这些字段常常缺失或不准确,部分原因是医生报销方案取决于所执行的医疗行为而非任何诊断。确定哪些个体患有特定疾病时的误差不仅会使患病率估计产生偏差,还会使该疾病与其他变量(如药物使用)之间的关联估计产生偏差。我们试图使用政府管理数据库来估计魁北克省老年人骨关节炎(OA)的患病率。我们将仅依赖数据库中列出的OA医生诊断的简单估计与几种不同的贝叶斯潜在类别模型的估计进行比较,这些模型通过使用其他可用诊断线索来调整错误分类的医生诊断代码。我们发现,患病率估计差异很大,这取决于所使用的模型和所做的假设。我们得出结论,在对数据库条目的可靠性进行进一步估计的工作开展之前,从这些数据库得出的任何推论都需要极其谨慎地加以解读。