Division of Rheumatology, University of California School of Medicine, San Francisco, CA, USA.
Department of Epidemiology and Biostatistics, Drexel University Dornsife School of Public Health, 3215 Market St., Philadelphia, PA, 19104, USA.
BMC Med Res Methodol. 2021 Oct 27;21(1):234. doi: 10.1186/s12874-021-01416-5.
Electronic health records (EHRs) are widely used in epidemiological research, but the validity of the results is dependent upon the assumptions made about the healthcare system, the patient, and the provider. In this review, we identify four overarching challenges in using EHR-based data for epidemiological analysis, with a particular emphasis on threats to validity. These challenges include representativeness of the EHR to a target population, the availability and interpretability of clinical and non-clinical data, and missing data at both the variable and observation levels. Each challenge reveals layers of assumptions that the epidemiologist is required to make, from the point of patient entry into the healthcare system, to the provider documenting the results of the clinical exam and follow-up of the patient longitudinally; all with the potential to bias the results of analysis of these data. Understanding the extent of as well as remediating potential biases requires a variety of methodological approaches, from traditional sensitivity analyses and validation studies, to newer techniques such as natural language processing. Beyond methods to address these challenges, it will remain crucial for epidemiologists to engage with clinicians and informaticians at their institutions to ensure data quality and accessibility by forming multidisciplinary teams around specific research projects.
电子健康记录(EHRs)在流行病学研究中被广泛应用,但结果的有效性取决于对医疗保健系统、患者和提供者的假设。在这篇综述中,我们确定了使用基于 EHR 的数据进行流行病学分析的四个总体挑战,特别强调了对有效性的威胁。这些挑战包括 EHR 对目标人群的代表性、临床和非临床数据的可用性和可解释性,以及变量和观察水平的缺失数据。每个挑战都揭示了流行病学家需要做出的层层假设,从患者进入医疗保健系统的那一刻起,到提供者记录临床检查结果和对患者进行纵向随访为止;所有这些都有可能使这些数据的分析结果产生偏差。了解这些偏差的程度并加以纠正,需要采用各种方法,从传统的敏感性分析和验证研究,到自然语言处理等新技术。除了解决这些挑战的方法外,让流行病学家与他们所在机构的临床医生和信息学家合作,通过围绕特定研究项目组建多学科团队,确保数据的质量和可及性,仍然至关重要。