• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于支持人工智能的健康数据生态系统的可扩展且透明的数据管道。

A scalable and transparent data pipeline for AI-enabled health data ecosystems.

作者信息

Namli Tuncay, Anıl Sınacı Ali, Gönül Suat, Herguido Cristina Ruiz, Garcia-Canadilla Patricia, Muñoz Adriana Modrego, Esteve Arnau Valls, Ertürkmen Gökçe Banu Laleci

机构信息

SRDC Software Research Development and Consultancy A. Ş., Ankara, Turkey.

Fundacio Sant Joan De Deu, Barcelona, Spain.

出版信息

Front Med (Lausanne). 2024 Jul 30;11:1393123. doi: 10.3389/fmed.2024.1393123. eCollection 2024.

DOI:10.3389/fmed.2024.1393123
PMID:39139784
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11321077/
Abstract

INTRODUCTION

Transparency and traceability are essential for establishing trustworthy artificial intelligence (AI). The lack of transparency in the data preparation process is a significant obstacle in developing reliable AI systems which can lead to issues related to reproducibility, debugging AI models, bias and fairness, and compliance and regulation. We introduce a formal data preparation pipeline specification to improve upon the manual and error-prone data extraction processes used in AI and data analytics applications, with a focus on traceability.

METHODS

We propose a declarative language to define the extraction of AI-ready datasets from health data adhering to a common data model, particularly those conforming to HL7 Fast Healthcare Interoperability Resources (FHIR). We utilize the FHIR profiling to develop a common data model tailored to an AI use case to enable the explicit declaration of the needed information such as phenotype and AI feature definitions. In our pipeline model, we convert complex, high-dimensional electronic health records data represented with irregular time series sampling to a flat structure by defining a target population, feature groups and final datasets. Our design considers the requirements of various AI use cases from different projects which lead to implementation of many feature types exhibiting intricate temporal relations.

RESULTS

We implement a scalable and high-performant feature repository to execute the data preparation pipeline definitions. This software not only ensures reliable, fault-tolerant distributed processing to produce AI-ready datasets and their metadata including many statistics alongside, but also serve as a pluggable component of a decision support application based on a trained AI model during online prediction to automatically prepare feature values of individual entities. We deployed and tested the proposed methodology and the implementation in three different research projects. We present the developed FHIR profiles as a common data model, feature group definitions and feature definitions within a data preparation pipeline while training an AI model for "predicting complications after cardiac surgeries".

DISCUSSION

Through the implementation across various pilot use cases, it has been demonstrated that our framework possesses the necessary breadth and flexibility to define a diverse array of features, each tailored to specific temporal and contextual criteria.

摘要

引言

透明度和可追溯性对于建立值得信赖的人工智能(AI)至关重要。数据准备过程中缺乏透明度是开发可靠AI系统的重大障碍,这可能导致与可重复性、调试AI模型、偏差和公平性以及合规性和监管相关的问题。我们引入了一个正式的数据准备管道规范,以改进AI和数据分析应用中使用的手动且容易出错的数据提取过程,重点是可追溯性。

方法

我们提出一种声明性语言,用于定义从健康数据中提取符合通用数据模型的AI就绪数据集,特别是那些符合HL7快速医疗保健互操作性资源(FHIR)的数据。我们利用FHIR概要文件开发一个针对AI用例量身定制的通用数据模型,以明确声明所需信息,如表型和AI特征定义。在我们的管道模型中,我们通过定义目标人群、特征组和最终数据集,将以不规则时间序列采样表示的复杂、高维电子健康记录数据转换为扁平结构。我们的设计考虑了来自不同项目的各种AI用例的要求,这些要求导致实现了许多具有复杂时间关系的特征类型。

结果

我们实现了一个可扩展且高性能的特征存储库,以执行数据准备管道定义。该软件不仅确保可靠、容错的分布式处理,以生成AI就绪数据集及其元数据(包括许多统计信息),而且在在线预测期间还可作为基于训练好的AI模型的决策支持应用程序的可插拔组件,自动准备单个实体的特征值。我们在三个不同的研究项目中部署并测试了所提出的方法和实现。在为 “预测心脏手术后的并发症” 训练AI模型时,我们展示了所开发的FHIR概要文件作为通用数据模型、数据准备管道中的特征组定义和特征定义。

讨论

通过在各种试点用例中的实施,已证明我们的框架具有必要的广度和灵活性,能够定义各种特征,每个特征都针对特定的时间和上下文标准进行了定制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/44d81a860187/fmed-11-1393123-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/421c76ab8dcd/fmed-11-1393123-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/daea94c2b318/fmed-11-1393123-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/fa8e38337af0/fmed-11-1393123-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/2cbadda4a486/fmed-11-1393123-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/44d81a860187/fmed-11-1393123-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/421c76ab8dcd/fmed-11-1393123-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/daea94c2b318/fmed-11-1393123-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/fa8e38337af0/fmed-11-1393123-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/2cbadda4a486/fmed-11-1393123-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5197/11321077/44d81a860187/fmed-11-1393123-g005.jpg

相似文献

1
A scalable and transparent data pipeline for AI-enabled health data ecosystems.用于支持人工智能的健康数据生态系统的可扩展且透明的数据管道。
Front Med (Lausanne). 2024 Jul 30;11:1393123. doi: 10.3389/fmed.2024.1393123. eCollection 2024.
2
A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): Validation and Usability Study.用于可扩展人工智能应用部署的标准化临床数据协调管道(FHIR-DHP):验证与可用性研究
JMIR Med Inform. 2023 Mar 21;11:e43847. doi: 10.2196/43847.
3
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发:以从出院小结中识别肥胖且伴有多种合并症的患者为例。
J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.
4
Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study.基于回顾性差异分析,通过快速医疗保健互操作性资源概况的迭代优化来挖掘医疗保健数据中的协调潜力:案例研究
JMIR Med Inform. 2024 Jul 23;12:e57005. doi: 10.2196/57005.
5
Fast Healthcare Interoperability Resources for Inpatient Deterioration Detection With Time-Series Vital Signs: Design and Implementation Study.用于基于时间序列生命体征的住院患者病情恶化检测的快速医疗保健互操作性资源:设计与实施研究
JMIR Med Inform. 2022 Oct 13;10(10):e42429. doi: 10.2196/42429.
6
FHIR-Ontop-OMOP: Building clinical knowledge graphs in FHIR RDF with the OMOP Common data Model.FHIR-Ontop-OMOP:在 FHIR RDF 中使用 OMOP 通用数据模型构建临床知识图谱。
J Biomed Inform. 2022 Oct;134:104201. doi: 10.1016/j.jbi.2022.104201. Epub 2022 Sep 9.
7
Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data.开发一个基于FHIR的可扩展临床数据标准化管道,用于对非结构化和结构化电子健康记录数据进行标准化和整合。
JAMIA Open. 2019 Oct 18;2(4):570-579. doi: 10.1093/jamiaopen/ooz056. eCollection 2019 Dec.
8
MENDS-on-FHIR: leveraging the OMOP common data model and FHIR standards for national chronic disease surveillance.基于FHIR的MENDS:利用OMOP通用数据模型和FHIR标准进行国家慢性病监测。
JAMIA Open. 2024 May 29;7(2):ooae045. doi: 10.1093/jamiaopen/ooae045. eCollection 2024 Jul.
9
A roadmap to artificial intelligence (AI): Methods for designing and building AI ready data to promote fairness.人工智能(AI)路线图:设计和构建 AI 就绪数据的方法,以促进公平性。
J Biomed Inform. 2024 Jun;154:104654. doi: 10.1016/j.jbi.2024.104654. Epub 2024 May 11.
10
Electronic Health Record and Semantic Issues Using Fast Healthcare Interoperability Resources: Systematic Mapping Review.电子健康记录与 Fast Healthcare Interoperability Resources 的语义问题:系统映射综述。
J Med Internet Res. 2024 Jan 30;26:e45209. doi: 10.2196/45209.

引用本文的文献

1
Machine Learning and Artificial Intelligence in Intensive Care Medicine: Critical Recalibrations from Rule-Based Systems to Frontier Models.重症监护医学中的机器学习与人工智能:从基于规则的系统到前沿模型的关键重新校准
J Clin Med. 2025 Jun 6;14(12):4026. doi: 10.3390/jcm14124026.
2
Perspective review: Will generative AI make common data models obsolete in future analyses of distributed data networks?观点综述:生成式人工智能会使通用数据模型在分布式数据网络的未来分析中过时吗?
Ther Adv Drug Saf. 2025 Apr 21;16:20420986251332743. doi: 10.1177/20420986251332743. eCollection 2025.
3
An assessment of the European Patient Summary for clinical research: a case study in cardiology.

本文引用的文献

1
Privacy-preserving federated machine learning on FAIR health data: A real-world application.公平健康数据上的隐私保护联邦机器学习:一个实际应用
Comput Struct Biotechnol J. 2024 Feb 17;24:136-145. doi: 10.1016/j.csbj.2024.02.014. eCollection 2024 Dec.
2
Development of Medical Imaging Data Standardization for Imaging-Based Observational Research: OMOP Common Data Model Extension.基于医学影像的观察性研究的医学影像数据标准化的发展:OMOP 通用数据模型扩展。
J Imaging Inform Med. 2024 Apr;37(2):899-908. doi: 10.1007/s10278-024-00982-6. Epub 2024 Feb 5.
3
A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): Validation and Usability Study.
欧洲临床研究患者摘要评估:心脏病学案例研究
Front Med (Lausanne). 2024 Nov 7;11:1481551. doi: 10.3389/fmed.2024.1481551. eCollection 2024.
用于可扩展人工智能应用部署的标准化临床数据协调管道(FHIR-DHP):验证与可用性研究
JMIR Med Inform. 2023 Mar 21;11:e43847. doi: 10.2196/43847.
4
A Data Transformation Methodology to Create Findable, Accessible, Interoperable, and Reusable Health Data: Software Design, Development, and Evaluation Study.一种创建可发现、可访问、可互操作和可重用健康数据的数据转换方法:软件设计、开发和评估研究。
J Med Internet Res. 2023 Mar 8;25:e42822. doi: 10.2196/42822.
5
The reproducibility issues that haunt health-care AI.困扰医疗保健人工智能的可重复性问题。
Nature. 2023 Jan;613(7943):402-403. doi: 10.1038/d41586-023-00023-2.
6
Pathling: analytics on FHIR.路径语言:FHIR 分析。
J Biomed Semantics. 2022 Sep 8;13(1):23. doi: 10.1186/s13326-022-00277-1.
7
Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies.深度学习在电子健康记录中的时间数据表示:挑战和方法的系统评价。
J Biomed Inform. 2022 Feb;126:103980. doi: 10.1016/j.jbi.2021.103980. Epub 2021 Dec 30.
8
FhirExtinguisher: A FHIR Resource Flattening Tool Using FHIRPath.FhirExtinguisher:一种使用FHIRPath的FHIR资源扁平化工具。
Stud Health Technol Inform. 2021 May 27;281:1112-1113. doi: 10.3233/SHTI210369.
9
Extending the OMOP Common Data Model and Standardized Vocabularies to Support Observational Cancer Research.扩展OMOP通用数据模型和标准化词汇表以支持观察性癌症研究。
JCO Clin Cancer Inform. 2021 Jan;5:12-20. doi: 10.1200/CCI.20.00079.
10
Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data.通过FIDDLE实现电子健康记录分析的普及:一种用于结构化临床数据的灵活的数据驱动预处理管道。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1921-1934. doi: 10.1093/jamia/ocaa139.