Glynn Earl F, Hoffman Mark A
Children's Mercy Hospital, Children's Research Institute, Kansas City, Missouri, USA.
Department of Pediatrics, University of Missouri Kansas City, Kansas City, Missouri, USA.
JAMIA Open. 2019 Aug 7;2(4):554-561. doi: 10.1093/jamiaopen/ooz035. eCollection 2019 Dec.
Electronic health record (EHR) data aggregated from multiple, non-affiliated, sources provide an important resource for biomedical research, including digital phenotyping. Unlike work with EHR data from a single organization, aggregate EHR data introduces a number of analysis challenges.
We used the Cerner Health Facts data, a de-identified aggregate EHR data resource populated by data from 100 independent health systems, to investigate the impact of EHR implementation factors on the aggregate data. These included use of ancillary modules, data continuity, International Classification of Disease (ICD) version and prompts for clinical documentation.
Health Facts includes six categories of data from ancillary modules. We found of the 664 facilities in Health Facts, 49 use all six categories while 88 facilities were not using any. We evaluated data contribution over time and found considerable variation at the health system and facility levels. We analyzed the transition from ICD-9 to ICD-10 and found that some organizations completed the shift in 2014 while others remained on ICD-9 in 2017, well after the 2015 deadline. We investigated the utilization of "discharge disposition" to document death and found inconsistent use of this field. We evaluated clinical events used to document travel status implemented in response to Ebola, height and smoking history. Smoking history documentation increased dramatically after Meaningful Use, but dropped in some organizations. These observations highlight the need for any research involving aggregate EHR data to consider implementation factors that contribute to variability in the data before attributing gaps to "missing data."
从多个非附属来源汇总的电子健康记录(EHR)数据为生物医学研究(包括数字表型分析)提供了重要资源。与处理单个组织的EHR数据不同,汇总的EHR数据带来了许多分析挑战。
我们使用了Cerner Health Facts数据,这是一个经过去标识化处理的汇总EHR数据资源,由来自100个独立医疗系统的数据组成,以研究EHR实施因素对汇总数据的影响。这些因素包括辅助模块的使用、数据连续性、国际疾病分类(ICD)版本以及临床文档提示。
Health Facts包括来自辅助模块的六类数据。我们发现,在Health Facts中的664个机构中,49个使用了所有六类数据,而88个机构未使用任何数据。我们评估了随时间的数据贡献,发现在医疗系统和机构层面存在相当大的差异。我们分析了从ICD-9到ICD-10的转变,发现一些组织在2014年完成了转变,而其他组织在2017年仍使用ICD-9,远远超过了2015年的截止日期。我们调查了用于记录死亡的“出院处置”的使用情况,发现该字段的使用不一致。我们评估了为应对埃博拉、身高和吸烟史而实施的用于记录旅行状态的临床事件。有意义使用之后,吸烟史记录大幅增加,但在一些组织中有所下降。这些观察结果凸显了任何涉及汇总EHR数据的研究在将差距归因于“缺失数据”之前,都需要考虑导致数据变异性的实施因素。