通过自动化实现公平性：为最大关怀型大学医院的 FAIR 健康数据开发自动化医疗数据集成基础设施。

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital.

机构信息

Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany.

University MS Center, Biomedical Research Institute (BIOMED), Hasselt University, Agoralaan Building C, 3590, Diepenbeek, Belgium.

出版信息

BMC Med Inform Decis Mak. 2023 May 15;23(1):94. doi: 10.1186/s12911-023-02195-3.

DOI:10.1186/s12911-023-02195-3

PMID:37189148

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10186636/

Abstract

BACKGROUND

Secondary use of routine medical data is key to large-scale clinical and health services research. In a maximum care hospital, the volume of data generated exceeds the limits of big data on a daily basis. This so-called "real world data" are essential to complement knowledge and results from clinical trials. Furthermore, big data may help in establishing precision medicine. However, manual data extraction and annotation workflows to transfer routine data into research data would be complex and inefficient. Generally, best practices for managing research data focus on data output rather than the entire data journey from primary sources to analysis. To eventually make routinely collected data usable and available for research, many hurdles have to be overcome. In this work, we present the implementation of an automated framework for timely processing of clinical care data including free texts and genetic data (non-structured data) and centralized storage as Findable, Accessible, Interoperable, Reusable (FAIR) research data in a maximum care university hospital.

METHODS

We identify data processing workflows necessary to operate a medical research data service unit in a maximum care hospital. We decompose structurally equal tasks into elementary sub-processes and propose a framework for general data processing. We base our processes on open-source software-components and, where necessary, custom-built generic tools.

RESULTS

We demonstrate the application of our proposed framework in practice by describing its use in our Medical Data Integration Center (MeDIC). Our microservices-based and fully open-source data processing automation framework incorporates a complete recording of data management and manipulation activities. The prototype implementation also includes a metadata schema for data provenance and a process validation concept. All requirements of a MeDIC are orchestrated within the proposed framework: Data input from many heterogeneous sources, pseudonymization and harmonization, integration in a data warehouse and finally possibilities for extraction or aggregation of data for research purposes according to data protection requirements.

CONCLUSION

Though the framework is not a panacea for bringing routine-based research data into compliance with FAIR principles, it provides a much-needed possibility to process data in a fully automated, traceable, and reproducible manner.

摘要

背景

常规医疗数据的二次利用是大规模临床和卫生服务研究的关键。在一家重症监护医院，每天生成的数据量都超出了大数据的极限。这些所谓的“真实世界数据”对于补充临床试验的知识和结果至关重要。此外，大数据还有助于建立精准医学。然而，将常规数据转换为研究数据的手动数据提取和标注工作流程既复杂又低效。通常，管理研究数据的最佳实践侧重于数据输出，而不是从原始数据源到分析的整个数据流程。为了最终使常规收集的数据可用并可用于研究，还需要克服许多障碍。在这项工作中，我们展示了在一家重症监护大学医院中实施一个自动化框架的情况，该框架用于及时处理临床护理数据（包括非结构化数据的自由文本和遗传数据），并将其作为可查找、可访问、可互操作、可重复使用（FAIR）的研究数据进行集中存储。

方法

我们确定了在重症监护医院运营医疗研究数据服务单元所需的数据处理工作流程。我们将结构上相等的任务分解为基本子流程，并提出了一个通用数据处理框架。我们的流程基于开源软件组件，并在必要时使用定制的通用工具。

结果

我们通过描述其在我们的医学数据集成中心（MeDIC）中的应用，展示了我们提出的框架在实践中的应用。我们基于微服务的、完全开源的数据处理自动化框架包含了对数据管理和操作活动的完整记录。原型实现还包括用于数据来源的元数据模式和过程验证概念。拟议框架中协调了 MeDIC 的所有要求：来自许多异构源的数据输入、假名化和协调、在数据仓库中的集成，以及根据数据保护要求提取或聚合数据用于研究目的的可能性。

结论

尽管该框架不是使基于常规的研究数据符合 FAIR 原则的万能药，但它为以全自动、可跟踪和可重复的方式处理数据提供了急需的可能性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/78cc/10186636/a1518c31b351/12911_2023_2195_Fig1_HTML.jpg

相似文献

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital.

BMC Med Inform Decis Mak. 2023 May 15;23(1):94. doi: 10.1186/s12911-023-02195-3.

Traceable Research Data Sharing in a German Medical Data Integration Center With FAIR (Findability, Accessibility, Interoperability, and Reusability)-Geared Provenance Implementation: Proof-of-Concept Study.

JMIR Form Res. 2023 Dec 7;7:e50027. doi: 10.2196/50027.

[FAIR health data in the national and international data space].

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Jun;67(6):710-720. doi: 10.1007/s00103-024-03884-8. Epub 2024 May 15.

Big Data Health Care Platform With Multisource Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application.

JMIR Med Inform. 2022 Apr 13;10(4):e36481. doi: 10.2196/36481.

Adamant: a JSON schema-based metadata editor for research data management workflows.

F1000Res. 2022 Apr 29;11:475. doi: 10.12688/f1000research.110875.2. eCollection 2022.

Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review.

JMIR Res Protoc. 2021 Feb 2;10(2):e22505. doi: 10.2196/22505.

From Raw Data to FAIR Data: The FAIRification Workflow for Health Research.

Methods Inf Med. 2020 Jun;59(S 01):e21-e32. doi: 10.1055/s-0040-1713684. Epub 2020 Jul 3.

Applying FAIRness: Redesigning a Biomedical Informatics Research Data Management Pipeline.

Methods Inf Med. 2019 Dec;58(6):229-234. doi: 10.1055/s-0040-1709158. Epub 2020 Apr 29.

Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study.

JMIR Med Inform. 2024 Apr 18;12:e53075. doi: 10.2196/53075.

Designing and piloting a generic research architecture and workflows to unlock German primary care data for secondary use.

J Transl Med. 2020 Oct 19;18(1):394. doi: 10.1186/s12967-020-02547-x.

引用本文的文献

Enhancing Clinical Data Infrastructure for AI Research: Comparative Evaluation of Data Management Architectures.

J Med Internet Res. 2025 Aug 1;27:e74976. doi: 10.2196/74976.

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.

JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.

Challenges and applications in generative AI for clinical tabular data in physiology.

Pflugers Arch. 2025 Apr;477(4):531-542. doi: 10.1007/s00424-024-03024-w. Epub 2024 Oct 17.

Integrating Clinical Data and Medical Imaging in Lung Cancer: Feasibility Study Using the Observational Medical Outcomes Partnership Common Data Model Extension.

JMIR Med Inform. 2024 Jul 12;12:e59187. doi: 10.2196/59187.

Research collaboration data platform ensuring general data protection.

Sci Rep. 2024 May 24;14(1):11887. doi: 10.1038/s41598-024-61912-8.

本文引用的文献

SCALEUS-FD: A FAIR Data Tool for Biomedical Applications.

Biomed Res Int. 2020 Aug 26;2020:3041498. doi: 10.1155/2020/3041498. eCollection 2020.

FAIR, safe and high-quality data: The data infrastructure and accessibility of the YOUth cohort study.

Dev Cogn Neurosci. 2020 Oct;45:100834. doi: 10.1016/j.dcn.2020.100834. Epub 2020 Aug 5.

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation.

JMIR Med Inform. 2020 Jul 21;8(7):e15918. doi: 10.2196/15918.

From Raw Data to FAIR Data: The FAIRification Workflow for Health Research.

Methods Inf Med. 2020 Jun;59(S 01):e21-e32. doi: 10.1055/s-0040-1713684. Epub 2020 Jul 3.

Problems in FAIRifying Medical Datasets.

Stud Health Technol Inform. 2020 Jun 16;270:392-396. doi: 10.3233/SHTI200189.

Applying FAIR Principles to Improve Data Searchability of Emergency Department Datasets: A Case Study for HCUP-SEDD.

Methods Inf Med. 2020 Feb;59(1):48-56. doi: 10.1055/s-0040-1712510. Epub 2020 Jun 14.

Provenance Solutions for Medical Research in Heterogeneous IT-Infrastructure: An Implementation Roadmap.

Stud Health Technol Inform. 2019 Aug 21;264:298-302. doi: 10.3233/SHTI190231.

HiGHmed - An Open Platform Approach to Enhance Care and Research across Institutional Boundaries.

Methods Inf Med. 2018 Jul;57(S 01):e66-e81. doi: 10.3414/ME18-02-0002. Epub 2018 Jul 17.

Enhancing Reuse of Data and Biological Material in Medical Research: From FAIR to FAIR-Health.

Biopreserv Biobank. 2018 Apr;16(2):97-105. doi: 10.1089/bio.2017.0110. Epub 2018 Jan 23.

Finding useful data across multiple biomedical data repositories using DataMed.

Nat Genet. 2017 May 26;49(6):816-819. doi: 10.1038/ng.3864.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过自动化实现公平性：为最大关怀型大学医院的 FAIR 健康数据开发自动化医疗数据集成基础设施。

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献