Olaker Veronica R, Fry Sarah, Terebuh Pauline, Davis Pamela B, Tisch Daniel J, Xu Rong, Miller Margaret G, Dorney Ian, Palchuk Matvey B, Kaelber David C
Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA.
Center for Community Health Integration, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA.
Clin Transl Sci. 2025 Jan;18(1):e70093. doi: 10.1111/cts.70093.
Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de-identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data.
电子健康记录(EHRs)虽用于临床和计费目的,但可为研究提供丰富信息。目前,有数据源能深入了解超过25亿人的健康史。然而,其使用充满风险,包括引入或强化偏差、疾病定义的清晰度、患者隐私保护、协变量或混杂因素的定义、用药与处方相比的准确性、引入其他数据源(如疫苗接种或死亡记录)的必要性以及随之而来的不准确、重复记录的可能性,还有理解和解释数据查询结果等问题。另一方面,研究罕见疾病的可能性或关联明显不同事件的能力极具价值。避免最严重缺陷并坚持保守解释的策略至关重要。本文总结了许多用于避免常见缺陷并从汇总、标准化和去标识的EHR数据中提取最大信息的方法。本文描述了26个主题,分为三大领域:(1)与使用EHR数据进行观察性研究的设计问题相关的14个主题,(2)分析EHR数据时与分析问题相关的7个主题,以及(3)使用EHR数据报告研究相关的5个主题。