Suppr超能文献

动态ETL:一种用于健康数据提取、转换和加载的混合方法。

Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading.

作者信息

Ong Toan C, Kahn Michael G, Kwan Bethany M, Yamashita Traci, Brandt Elias, Hosokawa Patrick, Uhrich Chris, Schilling Lisa M

机构信息

Departments of Pediatrics, University of Colorado Anschutz Medical Campus, School of Medicine, Building AO1 Room L15-1414, 12631 East 17th Avenue, Mail Stop F563, Aurora, CO, 80045, USA.

Colorado Clinical and Translational Sciences Institute, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, CO, USA.

出版信息

BMC Med Inform Decis Mak. 2017 Sep 13;17(1):134. doi: 10.1186/s12911-017-0532-3.

Abstract

BACKGROUND

Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks.

METHODS

We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., "health data domain experts") and target common data model.

RESULTS

D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets.

CONCLUSIONS

D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data.

摘要

背景

电子健康记录(EHRs)包含以专有格式存储的详细临床数据,具有非标准代码和结构。参与多中心临床研究网络需要将EHR数据进行重组,并转换为通用格式和标准术语,最好与其他数据源相链接。将数据转换以符合网络要求所需的专业知识和可扩展解决方案超出了许多医疗保健组织的能力范围,因此需要实用工具来降低向临床研究网络贡献数据的障碍。

方法

我们设计并实施了一种健康数据转换与加载方法,即动态ETL(提取、转换和加载)(D-ETL),通过使用可扩展、可重复使用和可定制的代码自动执行部分流程,同时保留需要复杂编码语法知识的手动流程部分。这种方法提供了异构数据ETL所需的灵活性、语义专业知识的差异以及转换逻辑的透明度,这些对于在临床研究共享网络中实施ETL约定至关重要。处理工作流程由ETL规范指南指导,该指南由对健康数据的结构和语义有广泛了解的ETL设计师(即“健康数据领域专家”)制定,并针对通用数据模型。

结果

实施D-ETL以执行ETL操作,将来自具有不同数据库模式结构的各种源的数据加载到观察性医疗结果合作组织(OMOP)通用数据模型中。结果表明,ETL规则组合方法和D-ETL引擎通过自动查询生成提供了一种可扩展的健康数据转换解决方案,以协调源数据集。

结论

D-ETL支持将健康数据转换并加载到目标数据模型中的灵活且透明的过程。这种方法提供了一种解决方案,降低了阻碍数据合作伙伴参与研究数据网络的技术障碍,因此促进了使用二级电子健康数据的比较效果研究的进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/0311c255fc87/12911_2017_532_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验