• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

动态ETL:一种用于健康数据提取、转换和加载的混合方法。

Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading.

作者信息

Ong Toan C, Kahn Michael G, Kwan Bethany M, Yamashita Traci, Brandt Elias, Hosokawa Patrick, Uhrich Chris, Schilling Lisa M

机构信息

Departments of Pediatrics, University of Colorado Anschutz Medical Campus, School of Medicine, Building AO1 Room L15-1414, 12631 East 17th Avenue, Mail Stop F563, Aurora, CO, 80045, USA.

Colorado Clinical and Translational Sciences Institute, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, CO, USA.

出版信息

BMC Med Inform Decis Mak. 2017 Sep 13;17(1):134. doi: 10.1186/s12911-017-0532-3.

DOI:10.1186/s12911-017-0532-3
PMID:28903729
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5598056/
Abstract

BACKGROUND

Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks.

METHODS

We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., "health data domain experts") and target common data model.

RESULTS

D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets.

CONCLUSIONS

D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data.

摘要

背景

电子健康记录(EHRs)包含以专有格式存储的详细临床数据,具有非标准代码和结构。参与多中心临床研究网络需要将EHR数据进行重组,并转换为通用格式和标准术语,最好与其他数据源相链接。将数据转换以符合网络要求所需的专业知识和可扩展解决方案超出了许多医疗保健组织的能力范围,因此需要实用工具来降低向临床研究网络贡献数据的障碍。

方法

我们设计并实施了一种健康数据转换与加载方法,即动态ETL(提取、转换和加载)(D-ETL),通过使用可扩展、可重复使用和可定制的代码自动执行部分流程,同时保留需要复杂编码语法知识的手动流程部分。这种方法提供了异构数据ETL所需的灵活性、语义专业知识的差异以及转换逻辑的透明度,这些对于在临床研究共享网络中实施ETL约定至关重要。处理工作流程由ETL规范指南指导,该指南由对健康数据的结构和语义有广泛了解的ETL设计师(即“健康数据领域专家”)制定,并针对通用数据模型。

结果

实施D-ETL以执行ETL操作,将来自具有不同数据库模式结构的各种源的数据加载到观察性医疗结果合作组织(OMOP)通用数据模型中。结果表明,ETL规则组合方法和D-ETL引擎通过自动查询生成提供了一种可扩展的健康数据转换解决方案,以协调源数据集。

结论

D-ETL支持将健康数据转换并加载到目标数据模型中的灵活且透明的过程。这种方法提供了一种解决方案,降低了阻碍数据合作伙伴参与研究数据网络的技术障碍,因此促进了使用二级电子健康数据的比较效果研究的进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/96d78e2331c5/12911_2017_532_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/0311c255fc87/12911_2017_532_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/14be5068556b/12911_2017_532_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/96d78e2331c5/12911_2017_532_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/0311c255fc87/12911_2017_532_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/14be5068556b/12911_2017_532_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/572a/5598056/96d78e2331c5/12911_2017_532_Fig3_HTML.jpg

相似文献

1
Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading.动态ETL:一种用于健康数据提取、转换和加载的混合方法。
BMC Med Inform Decis Mak. 2017 Sep 13;17(1):134. doi: 10.1186/s12911-017-0532-3.
2
Data interchange using i2b2.使用i2b2进行数据交换。
J Am Med Inform Assoc. 2016 Sep;23(5):909-15. doi: 10.1093/jamia/ocv188. Epub 2016 Feb 5.
3
A Semantic Transformation Methodology for the Secondary Use of Observational Healthcare Data in Postmarketing Safety Studies.一种用于上市后安全性研究中观察性医疗保健数据二次利用的语义转换方法。
Front Pharmacol. 2018 Apr 30;9:435. doi: 10.3389/fphar.2018.00435. eCollection 2018.
4
Eos and OMOCL: Towards a seamless integration of openEHR records into the OMOP Common Data Model.Eos 和 OMOCL:实现 openEHR 记录与 OMOP 通用数据模型的无缝集成。
J Biomed Inform. 2023 Aug;144:104437. doi: 10.1016/j.jbi.2023.104437. Epub 2023 Jul 12.
5
A Framework for Classification of Electronic Health Data Extraction-Transformation-Loading Challenges in Data Network Participation.数据网络参与中电子健康数据提取-转换-加载挑战的分类框架。
EGEMS (Wash DC). 2017 Jun 13;5(1):10. doi: 10.5334/egems.222.
6
Extract, transform, load framework for the conversion of health databases to OMOP.健康数据库到 OMOP 的转换的提取、转换、加载框架。
PLoS One. 2022 Apr 11;17(4):e0266911. doi: 10.1371/journal.pone.0266911. eCollection 2022.
7
Towards ETL Processes to OMOP CDM Using Metadata and Modularization.使用元数据和模块化实现 OMOP CDM 的 ETL 流程。
Stud Health Technol Inform. 2023 May 18;302:751-752. doi: 10.3233/SHTI230256.
8
TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse.TransformEHRs:一种用于构建透明 ETL 流程以实现 EHR 重用的灵活方法。
Methods Inf Med. 2022 Dec;61(S 02):e89-e102. doi: 10.1055/s-0042-1757763. Epub 2022 Oct 11.
9
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
10
An ETL-process design for data harmonization to participate in international research with German real-world data based on FHIR and OMOP CDM.一种基于FHIR和OMOP CDM进行数据协调以参与德国真实世界数据国际研究的ETL流程设计。
Int J Med Inform. 2023 Jan;169:104925. doi: 10.1016/j.ijmedinf.2022.104925. Epub 2022 Nov 10.

引用本文的文献

1
Lessons Learned From Building a Data Platform for Longitudinal, Analytical Use Cases and Scaling to 77 German Hospitals: Implementation Report.从构建用于纵向分析用例的数据平台并扩展至77家德国医院中汲取的经验教训:实施报告
JMIR Med Inform. 2025 Sep 12;13:e69853. doi: 10.2196/69853.
2
Enhancing Gen3 for clinical trial time series analytics and data discovery: a data commons framework for NIH clinical trials.增强Gen3用于临床试验时间序列分析和数据发现:美国国立卫生研究院临床试验的数据共享框架
Front Digit Health. 2025 Jul 23;7:1570009. doi: 10.3389/fdgth.2025.1570009. eCollection 2025.
3
Harmonizing population health data into OMOP common data model: a demonstration using COVID-19 sero-surveillance data from Nairobi Urban Health and Demographic Surveillance System.

本文引用的文献

1
Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies.用于科研的观察性医疗保健数据库网络中的数据提取与管理:欧盟药物不良反应(EU-ADR)、观察医疗结果合作组织(OMOP)、小型哨点监测系统(Mini-Sentinel)和医学研究信息与计算中心(MATRICE)策略的比较
EGEMS (Wash DC). 2016 Feb 8;4(1):1189. doi: 10.13063/2327-9214.1189. eCollection 2016.
2
Stakeholder Engagement in a Patient-Reported Outcomes (PRO) Measure Implementation: A Report from the SAFTINet Practice-based Research Network (PBRN).利益相关者参与患者报告结局(PRO)指标的实施:来自SAFTINet基于实践的研究网络(PBRN)的报告。
J Am Board Fam Med. 2016 Jan-Feb;29(1):102-15. doi: 10.3122/jabfm.2016.01.150141.
3
将人群健康数据整合到OMOP通用数据模型中:以内罗毕城市卫生和人口监测系统的COVID-19血清学监测数据为例
Front Digit Health. 2025 Jan 28;7:1423621. doi: 10.3389/fdgth.2025.1423621. eCollection 2025.
4
Discrepancies in Aggregate Patient Data between Two Sources with Data Originating from the Same Electronic Health Record: A Case Study.来自同一电子健康记录的两个数据源之间患者总体数据的差异:一项案例研究。
Appl Clin Inform. 2025 Jan;16(1):137-144. doi: 10.1055/a-2441-3677. Epub 2025 Feb 12.
5
A Generic Transformation Approach for Complex Laboratory Data Using the Fast Healthcare Interoperability Resources Mapping Language: Method Development and Implementation.使用快速医疗互操作性资源映射语言对复杂实验室数据进行通用转换方法:方法开发与实施。
JMIR Med Inform. 2024 Oct 18;12:e57569. doi: 10.2196/57569.
6
LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models.LLM-AIx:一种基于隐私保护大语言模型从非结构化医学文本中提取信息的开源管道。
medRxiv. 2024 Sep 3:2024.09.02.24312917. doi: 10.1101/2024.09.02.24312917.
7
Anesthesia decision analysis using a cloud-based big data platform.基于云的大数据平台的麻醉决策分析。
Eur J Med Res. 2024 Mar 25;29(1):201. doi: 10.1186/s40001-024-01764-0.
8
Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review.医学领域中使用元数据驱动方法进行数据协调:范围综述
JMIR Med Inform. 2024 Feb 14;12:e52967. doi: 10.2196/52967.
9
Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record.电子健康记录中的数据驱动发现推动心血管护理创新。
Am J Cardiol. 2023 Sep 15;203:136-148. doi: 10.1016/j.amjcard.2023.06.104. Epub 2023 Jul 25.
10
Engaging Patients and Other Stakeholders in "Designing for Dissemination" of Record Linkage Methods and Tools.让患者和其他利益相关者参与到“为传播设计”记录链接方法和工具中来。
Appl Clin Inform. 2023 Aug;14(4):670-683. doi: 10.1055/a-2105-6505. Epub 2023 Jun 5.
Medical home characteristics and asthma control: a prospective, observational cohort study protocol.医疗之家特征与哮喘控制:一项前瞻性观察队列研究方案
EGEMS (Wash DC). 2013 Dec 18;1(3):1032. doi: 10.13063/2327-9214.1032. eCollection 2013.
4
Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network.用于联合转化查询网络(SAFTINet)的可扩展架构:分布式数据网络的技术基础设施。
EGEMS (Wash DC). 2013 Oct 7;1(1):1027. doi: 10.13063/2327-9214.1027. eCollection 2013.
5
Preparing Electronic Clinical Data for Quality Improvement and Comparative Effectiveness Research: The SCOAP CERTAIN Automation and Validation Project.为质量改进和比较效果研究准备电子临床数据:SCOAP CERTAIN自动化与验证项目
EGEMS (Wash DC). 2013 Sep 10;1(1):1025. doi: 10.13063/2327-9214.1025. eCollection 2013.
6
Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics.生物医学与健康领域大数据的技术挑战:数据来源、基础设施与分析
Yearb Med Inform. 2014 Aug 15;9(1):42-7. doi: 10.15265/IY-2014-0018.
7
PCORnet: turning a dream into reality.PCORnet:将梦想变为现实。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):576-7. doi: 10.1136/amiajnl-2014-002864. Epub 2014 May 12.
8
Launching PCORnet, a national patient-centered clinical research network.启动 PCORnet,一个全国性的以患者为中心的临床研究网络。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):578-82. doi: 10.1136/amiajnl-2014-002747. Epub 2014 May 12.
9
PEDSnet: a National Pediatric Learning Health System.PEDSnet:国家儿科学习健康系统。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):602-6. doi: 10.1136/amiajnl-2014-002743. Epub 2014 May 12.
10
pSCANNER: patient-centered Scalable National Network for Effectiveness Research.pSCANNER:以患者为中心的可扩展全国有效性研究网络。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):621-6. doi: 10.1136/amiajnl-2014-002751. Epub 2014 Apr 29.