Suppr超能文献

从原始数据到可供研究使用:真实世界肿瘤学环境中基于FHIR的转换管道。

From raw data to research-ready: A FHIR-based transformation pipeline in a real-world oncology setting.

作者信息

Carbonaro Antonella, Giorgetti Luca, Ridolfi Lorenzo, Pasolini Roberto, Pagliarani Andrea, Cavallucci Martina, Andalò Alice, Gaudio Livia Del, De Angelis Paolo, Vespignani Roberto, Gentili Nicola

机构信息

Department of Computer Science and Engineering - DISI, University of Bologna, Via dell'Università 50, 47521, Cesena, Italy.

Department of Computer Science and Engineering - DISI, University of Bologna, Via dell'Università 50, 47521, Cesena, Italy.

出版信息

Comput Biol Med. 2025 Sep 12;197(Pt B):111051. doi: 10.1016/j.compbiomed.2025.111051.

Abstract

The exponential growth of healthcare data, driven by advancements in medical research and digital health technologies, has underscored the critical need for interoperability and standardization. However, the heterogeneous nature of real-world clinical data poses significant challenges to ensuring seamless data exchange and secondary use for research purposes. These challenges include syntactic inconsistencies (e.g., variable use of terminologies like ICD-10 vs SNOMED CT), semantic mismatches (e.g., differing conceptualizations of disease staging across institutions), and structural fragmentation (e.g., laboratory results encoded in free text rather than structured fields). Fast Healthcare Interoperability Resources (FHIR) has emerged as a leading standard for structuring and harmonizing healthcare data, enabling integration across diverse systems. This work presents a FHIR-based transformation pipeline that leverages Resource Description Framework (RDF) to convert raw, conceptually heterogeneous oncology data into research-ready, semantically enriched datasets. By representing FHIR resources as RDF graphs, our approach enables semantic interoperability, enhances data linkage across heterogeneous sources, and supports automated reasoning through ontology-based queries and inference mechanisms. The pipeline employs a templated conversion strategy, allowing for the declarative definition of mappings that enable domain experts to focus on the data model. In Cancer Virtual Lab, we applied this methodology to a real-world oncology dataset comprising 36,335 anonymized patient records, successfully converting 1,093,705 clinical records into 1,151,559 distinct RDF-based FHIR resource types. The process incorporated syntactic and semantic validation, along with expert review, to ensure technical correctness and clinical relevance. Our results demonstrate the feasibility of semantically integrating oncology data using FHIR and RDF, fostering machine-readable, interoperable knowledge representation. This enriched representation supports data quality monitoring and improvement, data harmonization, longitudinal analysis, advanced analytics, and AI-driven decision support, promoting large-scale secondary use.

摘要

在医学研究和数字健康技术进步的推动下,医疗保健数据呈指数级增长,这凸显了互操作性和标准化的迫切需求。然而,现实世界临床数据的异构性质给确保无缝数据交换和用于研究目的的二次使用带来了重大挑战。这些挑战包括句法不一致(例如,ICD - 10与SNOMED CT等术语的不同用法)、语义不匹配(例如,各机构对疾病分期的不同概念化)以及结构碎片化(例如,实验室结果以自由文本而非结构化字段编码)。快速医疗保健互操作性资源(FHIR)已成为构建和协调医疗保健数据的领先标准,实现了跨不同系统的集成。这项工作提出了一个基于FHIR的转换管道,该管道利用资源描述框架(RDF)将原始的、概念上异构的肿瘤学数据转换为可用于研究的、语义丰富的数据集。通过将FHIR资源表示为RDF图我们的方法实现了语义互操作性,增强了跨异构源的数据链接,并支持通过基于本体的查询和推理机制进行自动推理。该管道采用模板化转换策略,允许声明性地定义映射,使领域专家能够专注于数据模型。在癌症虚拟实验室中,我们将此方法应用于一个包含36,335条匿名患者记录的真实世界肿瘤学数据集,成功地将1,093,705条临床记录转换为1,151,559种不同的基于RDF的FHIR资源类型。该过程纳入了句法和语义验证以及专家评审,以确保技术正确性和临床相关性。我们的结果证明了使用FHIR和RDF对肿瘤学数据进行语义集成的可行性,促进了机器可读、可互操作的知识表示。这种丰富的表示支持数据质量监测和改进、数据协调、纵向分析、高级分析以及人工智能驱动的决策支持,促进大规模二次使用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验