NC TraCS Institute, UNC-School of Medicine, Chapel Hill, North Carolina, USA.
Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
J Am Med Inform Assoc. 2023 May 19;30(6):1125-1136. doi: 10.1093/jamia/ocad057.
Clinical encounter data are heterogeneous and vary greatly from institution to institution. These problems of variance affect interpretability and usability of clinical encounter data for analysis. These problems are magnified when multisite electronic health record (EHR) data are networked together. This article presents a novel, generalizable method for resolving encounter heterogeneity for analysis by combining related atomic encounters into composite "macrovisits."
Encounters were composed of data from 75 partner sites harmonized to a common data model as part of the NIH Researching COVID to Enhance Recovery Initiative, a project of the National Covid Cohort Collaborative. Summary statistics were computed for overall and site-level data to assess issues and identify modifications. Two algorithms were developed to refine atomic encounters into cleaner, analyzable longitudinal clinical visits.
Atomic inpatient encounters data were found to be widely disparate between sites in terms of length-of-stay (LOS) and numbers of OMOP CDM measurements per encounter. After aggregating encounters to macrovisits, LOS and measurement variance decreased. A subsequent algorithm to identify hospitalized macrovisits further reduced data variability.
Encounters are a complex and heterogeneous component of EHR data and native data issues are not addressed by existing methods. These types of complex and poorly studied issues contribute to the difficulty of deriving value from EHR data, and these types of foundational, large-scale explorations, and developments are necessary to realize the full potential of modern real-world data.
This article presents method developments to manipulate and resolve EHR encounter data issues in a generalizable way as a foundation for future research and analysis.
临床就诊数据具有异质性,并且在不同机构之间差异很大。这些变异性问题会影响临床就诊数据的可解释性和可用性,从而影响分析。当多站点电子健康记录 (EHR) 数据联网时,这些问题会更加严重。本文提出了一种新颖的、可推广的方法,通过将相关的原子就诊组合成复合的“宏就诊”,来解决分析中的就诊异质性问题。
就诊由来自 75 个合作机构的数据组成,这些数据已根据 NIH Researching COVID to Enhance Recovery Initiative(国家新冠队列合作研究项目)的要求,通过协调至一个通用数据模型。计算了总体和站点级别的数据的汇总统计信息,以评估问题并确定修改方案。开发了两种算法,将原子就诊数据精炼为更整洁、更易于分析的纵向临床就诊。
原子住院就诊数据在住院时间 (LOS) 和每个就诊的 OMOP CDM 测量次数方面,在不同站点之间存在广泛差异。将就诊汇总为宏就诊后,LOS 和测量变异性降低。随后的算法用于识别住院宏就诊,进一步降低了数据的变异性。
就诊是 EHR 数据的一个复杂且具有异质性的组成部分,现有方法无法解决原始数据的问题。这些类型的复杂且研究不足的问题,增加了从 EHR 数据中提取价值的难度,需要进行此类基础性的、大规模的探索和开发,才能充分发挥现代真实世界数据的潜力。
本文提出了以通用方式操纵和解决 EHR 就诊数据问题的方法发展,为未来的研究和分析奠定了基础。