Division of Infectious Diseases, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Center for Real-World Evidence and Safety of Therapeutics, Center for Clinical Epidemiology and Biostatistics, and Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Pharmacoepidemiol Drug Saf. 2023 Jan;32(1):1-8. doi: 10.1002/pds.5537. Epub 2022 Sep 14.
Real-world healthcare data, including administrative and electronic medical record databases, provide a rich source of data for the conduct of pharmacoepidemiologic studies but carry the potential for misclassification of health outcomes of interest (HOIs). Validation studies are important ways to quantify the degree of error associated with case-identifying algorithms for HOIs and are crucial for interpreting study findings within real-world data. This review provides a rationale, framework, and step-by-step approach to validating case-identifying algorithms for HOIs within healthcare databases. Key steps in validating a case-identifying algorithm within a healthcare database include: (1) selecting the appropriate health outcome; (2) determining the reference standard against which to validate the algorithm; (3) developing the algorithm using diagnosis codes, diagnostic tests or their results, procedures, drug therapies, patient-reported symptoms or diagnoses, or some combinations of these parameters; (4) selection of patients and sample sizes for validation; (5) collecting data to confirm the HOI; (6) confirming the HOI; and (7) assessing the algorithm's performance. Additional strategies for algorithm refinement and methods to correct for bias due to misclassification of outcomes are discussed. The review concludes by discussing factors affecting the transportability of case-identifying algorithms and the need for ongoing validation as data elements within healthcare databases, such as diagnosis codes, change over time or new variables, such as patient-generated health data, are included in these data sources.
真实世界的医疗保健数据,包括行政和电子病历数据库,为开展药物流行病学研究提供了丰富的数据来源,但也存在感兴趣的健康结局(HOI)分类错误的潜在风险。验证研究是量化与 HOI 病例识别算法相关错误程度的重要方法,对于在真实世界数据中解释研究结果至关重要。本文提供了在医疗保健数据库中验证 HOI 病例识别算法的原理、框架和逐步方法。在医疗保健数据库中验证病例识别算法的关键步骤包括:(1)选择适当的健康结局;(2)确定验证算法的参考标准;(3)使用诊断代码、诊断测试或其结果、程序、药物治疗、患者报告的症状或诊断,或这些参数的某些组合来开发算法;(4)选择验证用的患者和样本量;(5)收集数据以确认 HOI;(6)确认 HOI;(7)评估算法的性能。还讨论了算法改进的其他策略和纠正因结局分类错误而导致的偏倚的方法。最后,本文讨论了影响病例识别算法可转移性的因素,以及随着医疗保健数据库中诊断代码等数据元素随时间变化或包含患者生成的健康数据等新变量,需要持续验证的必要性。