• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TransformEHRs:一种用于构建透明 ETL 流程以实现 EHR 重用的灵活方法。

TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse.

机构信息

Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain.

ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain.

出版信息

Methods Inf Med. 2022 Dec;61(S 02):e89-e102. doi: 10.1055/s-0042-1757763. Epub 2022 Oct 11.

DOI:10.1055/s-0042-1757763
PMID:36220109
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9788916/
Abstract

BACKGROUND

During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable.

OBJECTIVES

This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization.

METHODS

The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML.

RESULTS

First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined.

CONCLUSIONS

This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.

摘要

背景

在 COVID-19 大流行期间,设计了几种方法来获取用于研究的电子健康记录 (EHR) 衍生数据集。这些过程通常基于黑盒,临床研究人员不知道数据是如何记录、提取和转换的。为了解决这个问题,至关重要的是提取、转换和加载 (ETL) 过程基于透明、同质和正式的方法,使其具有可理解性、可重复性和可审核性。

目的

本研究旨在设计并实施一种符合 FAIR 原则的方法,以透明且灵活的方式构建 EHR 再利用的 ETL 流程(侧重于数据提取、选择和转换),适用于任何临床情况和医疗保健组织。

方法

所提出的方法包括四个阶段:(1) 分析二次使用模型并根据国际上使用的临床知识库、病例报告表和聚合数据集识别数据操作;(2) 通过详细临床模型范例对数据操作进行建模和形式化;(3) 选择 SQL 和 R 作为编程语言进行数据操作的无偏见开发;以及 (4) 通过使用 XML 构建正式配置文件来实现 ETL 实例化的自动化。

结果

首先,分析了四个国际项目,以确定根据这些项目从 EHR 中获取数据集所需的 17 个操作。通过这种方式,使用 ISO 13606 参考模型对每个数据操作进行了形式化,指定了有效的数据类型作为参数、输入和输出及其基数。然后,通过先前选择的数据导向编程语言开发了一个无偏数据目录。最后,从正式定义的 ETL 配置文件构建了自动化 ETL 实例化过程。

结论

本研究为使二次使用 EHR 衍生数据的过程变得易于理解、审核和可重复提供了一种透明且灵活的解决方案。此外,本研究中的抽象意味着任何先前的 EHR 重用方法都可以将这些结果纳入其中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/e0a5c4e39b3c/10-1055-s-0042-1757763-i22020001-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/ef388e1502df/10-1055-s-0042-1757763-i22020001-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/4d8ab8a00191/10-1055-s-0042-1757763-i22020001-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/649f3892e33b/10-1055-s-0042-1757763-i22020001-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/e0a5c4e39b3c/10-1055-s-0042-1757763-i22020001-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/ef388e1502df/10-1055-s-0042-1757763-i22020001-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/4d8ab8a00191/10-1055-s-0042-1757763-i22020001-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/649f3892e33b/10-1055-s-0042-1757763-i22020001-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0a/9788916/e0a5c4e39b3c/10-1055-s-0042-1757763-i22020001-4.jpg

相似文献

1
TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse.TransformEHRs:一种用于构建透明 ETL 流程以实现 EHR 重用的灵活方法。
Methods Inf Med. 2022 Dec;61(S 02):e89-e102. doi: 10.1055/s-0042-1757763. Epub 2022 Oct 11.
2
Making EHRs Reusable: A Common Framework of Data Operations.实现电子健康记录的可复用性:一种数据操作的通用框架。
Stud Health Technol Inform. 2021 Nov 18;287:129-133. doi: 10.3233/SHTI210831.
3
Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models.在短时间内获取 COVID-19 研究的电子健康记录(EHR)衍生数据集:一种基于详细临床模型的灵活方法。
J Biomed Inform. 2021 Mar;115:103697. doi: 10.1016/j.jbi.2021.103697. Epub 2021 Feb 3.
4
Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading.动态ETL:一种用于健康数据提取、转换和加载的混合方法。
BMC Med Inform Decis Mak. 2017 Sep 13;17(1):134. doi: 10.1186/s12911-017-0532-3.
5
An Ontology-Based Approach for Consolidating Patient Data Standardized With European Norm/International Organization for Standardization 13606 (EN/ISO 13606) Into Joint Observational Medical Outcomes Partnership (OMOP) Repositories: Description of a Methodology.一种基于本体的方法,用于将按照欧洲规范/国际标准化组织13606(EN/ISO 13606)标准化的患者数据整合到联合观察性医学转归合作组织(OMOP)知识库中:方法描述。
JMIR Med Inform. 2023 Mar 8;11:e44547. doi: 10.2196/44547.
6
HCE2RNFC: An Efficient Methodology for Reusing the EHR in the Spanish National Hip Fracture Registry.HCE2RNFC:一种在西班牙国家髋关节骨折登记处中重复使用电子健康记录的有效方法。
Stud Health Technol Inform. 2024 Aug 22;316:1422-1426. doi: 10.3233/SHTI240679.
7
openEHR Archetype Use and Reuse Within Multilingual Clinical Data Sets: Case Study.多语言临床数据集中openEHR原型的使用与复用:案例研究
J Med Internet Res. 2020 Nov 2;22(11):e23361. doi: 10.2196/23361.
8
A Semantic Transformation Methodology for the Secondary Use of Observational Healthcare Data in Postmarketing Safety Studies.一种用于上市后安全性研究中观察性医疗保健数据二次利用的语义转换方法。
Front Pharmacol. 2018 Apr 30;9:435. doi: 10.3389/fphar.2018.00435. eCollection 2018.
9
Comparing automated vs. manual data collection for COVID-specific medications from electronic health records.比较电子健康记录中 COVID 特定药物的自动数据采集与手动数据采集。
Int J Med Inform. 2022 Jan;157:104622. doi: 10.1016/j.ijmedinf.2021.104622. Epub 2021 Oct 21.
10
Rapid Development of Specialty Population Registries and Quality Measures from Electronic Health Record Data*. An Agile Framework.利用电子健康记录数据快速开发专科人群登记册和质量指标*。一个敏捷框架。
Methods Inf Med. 2017 Jun 14;56(99):e74-e83. doi: 10.3414/ME16-02-0031.

引用本文的文献

1
Can OpenEHR, ISO 13606, and HL7 FHIR Work Together? An Agnostic Approach for the Selection and Application of Electronic Health Record Standards to the Next-Generation Health Data Spaces.OpenEHR、ISO 13606 和 HL7 FHIR 能否协同工作?一种针对下一代健康数据空间的电子健康记录标准选择和应用的中立方法。
J Med Internet Res. 2023 Dec 28;25:e48702. doi: 10.2196/48702.
2
An Ontology-Based Approach for Consolidating Patient Data Standardized With European Norm/International Organization for Standardization 13606 (EN/ISO 13606) Into Joint Observational Medical Outcomes Partnership (OMOP) Repositories: Description of a Methodology.一种基于本体的方法,用于将按照欧洲规范/国际标准化组织13606(EN/ISO 13606)标准化的患者数据整合到联合观察性医学转归合作组织(OMOP)知识库中:方法描述。
JMIR Med Inform. 2023 Mar 8;11:e44547. doi: 10.2196/44547.

本文引用的文献

1
Building an i2b2-Based Population Repository for COVID-19 Research.基于 i2b2 的 COVID-19 研究人群存储库的构建。
Stud Health Technol Inform. 2022 May 25;294:287-291. doi: 10.3233/SHTI220460.
2
Making EHRs Trustable: A Quality Analysis of EHR-Derived Datasets for COVID-19 Research.使电子健康记录可信:用于 COVID-19 研究的电子健康记录衍生数据集的质量分析。
Stud Health Technol Inform. 2022 May 25;294:164-168. doi: 10.3233/SHTI220430.
3
Making EHRs Reusable: A Common Framework of Data Operations.实现电子健康记录的可复用性:一种数据操作的通用框架。
Stud Health Technol Inform. 2021 Nov 18;287:129-133. doi: 10.3233/SHTI210831.
4
The value of open-source clinical science in pandemic response: lessons from ISARIC.开源临床科学在应对大流行中的价值:来自国际严重急性呼吸道感染和新发呼吸道病毒合作组织(ISARIC)的经验教训。
Lancet Infect Dis. 2021 Dec;21(12):1623-1624. doi: 10.1016/S1473-3099(21)00565-X. Epub 2021 Oct 4.
5
Blueprint for aligned data exchange for research and public health.研究和公共卫生用数据交换的蓝图。
J Am Med Inform Assoc. 2021 Nov 25;28(12):2702-2706. doi: 10.1093/jamia/ocab210.
6
Use of EHRs in a Tertiary Hospital During COVID-19 Pandemic: A Multi-Purpose Approach Based on Standards.在 COVID-19 大流行期间使用电子健康记录在三级医院中的应用:基于标准的多用途方法。
Stud Health Technol Inform. 2021 May 27;281:28-32. doi: 10.3233/SHTI210114.
7
The Challenge of the Effective Implementation of FAIR Principles in Biomedical Research.在生物医学研究中有效实施FAIR原则面临的挑战。
Methods Inf Med. 2020 Aug;59(4-05):117-118. doi: 10.1055/s-0040-1721726. Epub 2021 Feb 22.
8
What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.每位读者应该了解的关于使用电子健康记录数据的研究,但可能不敢问的事。
J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219.
9
Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models.在短时间内获取 COVID-19 研究的电子健康记录(EHR)衍生数据集:一种基于详细临床模型的灵活方法。
J Biomed Inform. 2021 Mar;115:103697. doi: 10.1016/j.jbi.2021.103697. Epub 2021 Feb 3.
10
International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium.国际电子健康记录衍生的COVID-19临床病程概况:4CE联盟
NPJ Digit Med. 2020 Aug 19;3:109. doi: 10.1038/s41746-020-00308-0. eCollection 2020.