Medicine Institute, Cleveland Clinic, Cleveland, OH, USA.
Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA.
Med Decis Making. 2021 Feb;41(2):133-142. doi: 10.1177/0272989X20954403. Epub 2020 Sep 24.
Electronic health records (EHRs) offer the potential to study large numbers of patients but are designed for clinical practice, not research. Despite the increasing availability of EHR data, their use in research comes with its own set of challenges. In this article, we describe some important considerations and potential solutions for commonly encountered problems when working with large-scale, EHR-derived data for health services and community-relevant health research. Specifically, using EHR data requires the researcher to define the relevant patient subpopulation, reliably identify the primary care provider, recognize the EHR as containing episodic (i.e., unstructured longitudinal) data, account for changes in health system composition and treatment options over time, understand that the EHR is not always well-organized and accurate, design methods to identify the same patient across multiple health systems, account for the enormous size of the EHR, and consider barriers to data access. Associations found in the EHR may be nonrepresentative of associations in the general population, but a clear understanding of the EHR-based associations can be enormously valuable to the process of improving outcomes for patients in learning health care systems. In the context of building 2 large-scale EHR-derived data sets for health services research, we describe the potential pitfalls of EHR data and propose some solutions for those planning to use EHR data in their research. As ever greater amounts of clinical data are amassed in the EHR, use of these data for research will become increasingly common and important. Attention to the intricacies of EHR data will allow for more informed analysis and interpretation of results from EHR-based data sets.
电子健康记录 (EHR) 具有研究大量患者的潜力,但它是为临床实践而设计的,而不是为研究设计的。尽管 EHR 数据越来越容易获得,但在研究中使用它们也带来了一系列自身的挑战。在本文中,我们描述了在使用大规模、源自 EHR 的数据进行卫生服务和与社区相关的健康研究时,通常遇到的问题的一些重要考虑因素和潜在解决方案。具体来说,使用 EHR 数据要求研究人员定义相关的患者亚群,可靠地识别初级保健提供者,认识到 EHR 包含间歇性(即非结构化的纵向)数据,考虑随着时间的推移卫生系统组成和治疗选择的变化,了解 EHR 并不总是组织良好和准确的,设计方法以识别多个卫生系统中的相同患者,考虑 EHR 的巨大规模,并考虑数据访问的障碍。EHR 中发现的关联可能与一般人群中的关联不具有代表性,但对 EHR 中基于关联的清晰理解对于改善学习型医疗保健系统中患者的治疗结果过程具有巨大价值。在构建 2 个大型 EHR 衍生数据用于卫生服务研究的背景下,我们描述了 EHR 数据的潜在陷阱,并为那些计划在其研究中使用 EHR 数据的人提出了一些解决方案。随着越来越多的临床数据在 EHR 中积累,这些数据在研究中的使用将变得越来越普遍和重要。对 EHR 数据的复杂性的关注将允许对基于 EHR 的数据集的结果进行更明智的分析和解释。