Hubbard Rebecca A, Lett Elle, Ho Gloria Y F, Chubak Jessica
Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA.
Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA.
Health Serv Outcomes Res Methodol. 2021 Sep;21(3):309-323. doi: 10.1007/s10742-020-00235-3. Epub 2021 Jan 4.
Data derived from electronic health records (EHR) are heterogeneous with availability of specific measures dependent on the type and timing of patients' healthcare interactions. This creates a challenge for research using EHR-derived exposures because gold-standard exposure data, determined by a definitive assessment, may only be available for a subset of the population. Alternative approaches to exposure ascertainment in this case include restricting the analytic sample to only those patients with gold-standard exposure data available (exclusion); using gold-standard data, when available, and using a proxy exposure measure when the gold standard is unavailable (best available); or using a proxy exposure measure for everyone (common data). Exclusion may induce selection bias in outcome/exposure association estimates, while incorporating information from a proxy exposure via either the best available or common data approaches may result in information bias due to measurement error. The objective of this paper was to explore the bias and efficiency of these three analytic approaches across a broad range of scenarios motivated by a study of the association between chronic hyperglycemia and five-year mortality in an EHR-derived cohort of colon cancer survivors. We found that the best available approach tended to mitigate inefficiency and selection bias resulting from exclusion while suffering from less information bias than the common data approach. However, bias in all three approaches can be severe, particularly when both selection bias and information bias are present. When risk of either of these biases is judged to be more than moderate, EHR-based analyses may lead to erroneous conclusions.
从电子健康记录(EHR)中获取的数据具有异质性,特定测量指标的可用性取决于患者医疗互动的类型和时间。这给使用EHR衍生暴露数据的研究带来了挑战,因为通过确定性评估确定的金标准暴露数据可能仅适用于一部分人群。在这种情况下,暴露确定的替代方法包括将分析样本限制为仅那些有金标准暴露数据的患者(排除法);在有金标准数据时使用金标准数据,在没有金标准数据时使用替代暴露测量方法(最佳可用法);或者对所有人使用替代暴露测量方法(通用数据法)。排除法可能会在结局/暴露关联估计中导致选择偏倚,而通过最佳可用法或通用数据法纳入替代暴露信息可能会由于测量误差导致信息偏倚。本文的目的是在由EHR衍生的结肠癌幸存者队列中慢性高血糖与五年死亡率之间关联的研究引发的广泛场景中,探索这三种分析方法的偏倚和效率。我们发现,最佳可用法往往能减轻排除法导致的低效率和选择偏倚,同时比通用数据法遭受的信息偏倚更少。然而,这三种方法中的偏倚都可能很严重,尤其是当选择偏倚和信息偏倚同时存在时。当判断这些偏倚中的任何一种风险超过中等程度时,基于EHR的分析可能会得出错误的结论。