Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Thüringen, Germany.
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany.
Appl Clin Inform. 2023 Jan;14(1):54-64. doi: 10.1055/s-0042-1760436. Epub 2023 Jan 25.
The growing interest in the secondary use of electronic health record (EHR) data has increased the number of new data integration and data sharing infrastructures. The present work has been developed in the context of the German Medical Informatics Initiative, where 29 university hospitals agreed to the usage of the Health Level Seven Fast Healthcare Interoperability Resources (FHIR) standard for their newly established data integration centers. This standard is optimized to describe and exchange medical data but less suitable for standard statistical analysis which mostly requires tabular data formats.
The objective of this work is to establish a tool that makes FHIR data accessible for standard statistical analysis by providing means to retrieve and transform data from a FHIR server. The tool should be implemented in a programming environment known to most data analysts and offer functions with variable degrees of flexibility and automation catering to users with different levels of FHIR expertise.
We propose the fhircrackr framework, which allows downloading and flattening FHIR resources for data analysis. The framework supports different download and authentication protocols and gives the user full control over the data that is extracted from the FHIR resources and transformed into tables. We implemented it using the programming language R [1] and published it under the GPL-3 open source license.
The framework was successfully applied to both publicly available test data and real-world data from several ongoing studies. While the processing of larger real-world data sets puts a considerable burden on computation time and memory consumption, those challenges can be attenuated with a number of suitable measures like parallelization and temporary storage mechanisms.
The fhircrackr R package provides an open source solution within an environment that is familiar to most data scientists and helps overcome the practical challenges that still hamper the usage of EHR data for research.
对电子健康记录 (EHR) 数据的二次使用的兴趣日益浓厚,这增加了新的数据集成和数据共享基础架构的数量。本工作是在德国医学信息学倡议的背景下开发的,其中 29 所大学医院同意在其新成立的数据集成中心使用健康水平七号快速医疗互操作性资源 (FHIR) 标准。该标准经过优化,可用于描述和交换医疗数据,但不太适合标准统计分析,标准统计分析大多需要使用表格数据格式。
本工作的目的是建立一种工具,通过提供从 FHIR 服务器检索和转换数据的方法,使 FHIR 数据可用于标准统计分析。该工具应在大多数数据分析师熟悉的编程环境中实现,并提供具有不同程度灵活性和自动化功能的函数,以满足具有不同 FHIR 专业知识水平的用户的需求。
我们提出了 fhircrackr 框架,该框架允许下载和扁平化 FHIR 资源进行数据分析。该框架支持不同的下载和认证协议,并使用户可以完全控制从 FHIR 资源中提取并转换为表格的数据。我们使用编程语言 R [1] 实现了它,并在 GPL-3 开源许可证下发布。
该框架已成功应用于公共可用测试数据和来自多个正在进行的研究的真实世界数据。虽然较大的真实世界数据集的处理对计算时间和内存消耗提出了相当大的要求,但通过并行化和临时存储机制等一些适当的措施,可以减轻这些挑战。
fhircrackr R 包在大多数数据科学家熟悉的环境中提供了一个开源解决方案,并有助于克服仍然阻碍使用 EHR 数据进行研究的实际挑战。