利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.

作者信息

Hurtado Lluís-F, Marco-Ruiz Luis, Segarra Encarna, Castro-Bleda Maria Jose, Bustos-Moreno Aurelia, Iglesia-Vayá Maria de la, Vallalta-Rueda Juan Francisco

机构信息

VRAIN: Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València, Camí de Vera s/n, València, 46020, Spain; ValgrAI: Valencian Graduate School and Research Network of Artificial Intelligence, Camí de Vera s/n, València, 46020, Spain.

Norwegian Centre for E-health Research, University Hospital of North Norway, P.O. Box 35, Tromsø, N-9038, Norway.

出版信息

Comput Methods Programs Biomed. 2025 Mar;260:108567. doi: 10.1016/j.cmpb.2024.108567. Epub 2025 Jan 3.

DOI:10.1016/j.cmpb.2024.108567

PMID:39787917

Abstract

BACKGROUND AND OBJECTIVE

Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.

METHODS AND RESULTS

Transformers-based NLP Models, concretely pre-trained RoBERTa language models, were used to process radiology reports and annotate them identifying elements matching UMLS Concept Unique Identifiers (CUIs) definitions. CUIs were mapped into several biomedical ontologies useful for phenotyping (e.g., SNOMED-CT, HPO, ICD-10, FMA, LOINC, and ICPC2, among others) and represented as a lightweight ontology using OWL (Web Ontology Language) constructs. This process resulted in a Linked Knowledge Base (LKB), which allows running expressive queries to retrieve reports that comply with specific criteria using automatic reasoning.

CONCLUSION

Although phenotyping tools mostly rely on relational databases, the combination of NLP and Linked Data technologies allows us to build scalable knowledge bases using standard ontologies from the Web of data. Our approach enables us to execute a pipeline which input is free text and automatically maps identified entities to a LKB that allows answering phenotyping queries. In this work, we have only used Spanish radiology reports, although it is extensible to other languages for which suitable corpora are available. This is particularly valuable in regional and national systems dealing with large research databases from different registries and cohorts and plays an essential role in the scalability of large data reuse infrastructures that require indexing and governing distributed data sources.

摘要

背景与目的

尽管在电子健康记录（EHR）的规范化和标准化方面投入巨大，但临床记录中自由文本仍是常态而非例外。自由文本的使用对用于支持临床研究的数据复用方法有影响，因为队列定义和患者匹配中使用的查询机制主要基于结构化数据和临床术语。本研究旨在开发一种临床文本二次使用的方法，具体包括：（a）使用自然语言处理（NLP）为临床记录标记生物医学术语；（b）设计一种本体，将所有识别出的标签映射并分类到各种术语中，并允许运行表型查询。

方法与结果

基于Transformer的NLP模型，具体为预训练的RoBERTa语言模型，用于处理放射学报告并对其进行注释，识别与统一医学语言系统（UMLS）概念唯一标识符（CUI）定义匹配的元素。CUI被映射到几个对表型分析有用的生物医学本体（例如，医学系统命名法 - 临床术语[SNOMED - CT]、人类表型本体[HPO]、国际疾病分类第十版[ICD - 10]、解剖学基础模型[FMA]、逻辑观察标识符名称和代码[LOINC]以及国际初级保健分类第二版[ICPC2]等），并使用OWL（网络本体语言）构建表示为轻量级本体。这一过程产生了一个链接知识库（LKB），它允许运行表达性查询，以使用自动推理检索符合特定标准的报告。

结论

尽管表型分析工具大多依赖关系数据库，但NLP和链接数据技术的结合使我们能够使用来自数据网络的标准本体构建可扩展的知识库。我们的方法使我们能够执行一个管道，其输入是自由文本，并自动将识别出的实体映射到一个允许回答表型查询的LKB。在这项工作中，我们仅使用了西班牙语放射学报告，尽管它可扩展到有合适语料库的其他语言。这在处理来自不同登记处和队列的大型研究数据库的区域和国家系统中特别有价值，并且在需要对分布式数据源进行索引和管理的大数据复用基础设施的可扩展性方面发挥着重要作用。

相似文献

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Comput Methods Programs Biomed. 2025 Mar;260:108567. doi: 10.1016/j.cmpb.2024.108567. Epub 2025 Jan 3.

Ontology-driven and weakly supervised rare disease identification from clinical notes.基于本体的临床笔记辅助下的弱监督罕见病识别。

BMC Med Inform Decis Mak. 2023 May 5;23(1):86. doi: 10.1186/s12911-023-02181-9.

Natural language processing to identify lupus nephritis phenotype in electronic health records.利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。

BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

Reshaping free-text radiology notes into structured reports with generative question answering transformers.利用生成式问答变换模型将自由文本放射学报告改造成结构化报告。

Artif Intell Med. 2024 Aug;154:102924. doi: 10.1016/j.artmed.2024.102924. Epub 2024 Jun 26.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.SIFR 标注器：基于本体论的法语生物医学文本和临床笔记的语义标注。

BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.

A hybrid framework with large language models for rare disease phenotyping.基于大语言模型的罕见病表型分析混合框架。

BMC Med Inform Decis Mak. 2024 Oct 8;24(1):289. doi: 10.1186/s12911-024-02698-7.

Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.放射学报告的自动匿名化：公开可用的自然语言处理与大语言模型的比较

Eur Radiol. 2025 May;35(5):2634-2641. doi: 10.1007/s00330-024-11148-x. Epub 2024 Oct 31.

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium.电子健康记录的高通量表型标准化和规范化：SHARPn 联盟。

J Am Med Inform Assoc. 2013 Dec;20(e2):e341-8. doi: 10.1136/amiajnl-2013-001939. Epub 2013 Nov 4.

Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.使用基于弱监督的深度学习对临床笔记进行阿尔茨海默病生活方式状况分类。

BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4.

Termviewer - A Web Application for Streamlined Human Phenotype Ontology (HPO) Tagging and Document Annotation.Termviewer - 一个用于简化人类表型本体 (HPO) 标记和文档注释的 Web 应用程序。

Chem Biodivers. 2022 Dec;19(12):e202200805. doi: 10.1002/cbdv.202200805. Epub 2022 Nov 3.

利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.

作者信息

Hurtado Lluís-F, Marco-Ruiz Luis, Segarra Encarna, Castro-Bleda Maria Jose, Bustos-Moreno Aurelia, Iglesia-Vayá Maria de la, Vallalta-Rueda Juan Francisco

机构信息

Norwegian Centre for E-health Research, University Hospital of North Norway, P.O. Box 35, Tromsø, N-9038, Norway.

出版信息

Comput Methods Programs Biomed. 2025 Mar;260:108567. doi: 10.1016/j.cmpb.2024.108567. Epub 2025 Jan 3.

DOI:10.1016/j.cmpb.2024.108567

PMID:39787917

Abstract

BACKGROUND AND OBJECTIVE

METHODS AND RESULTS

CONCLUSION

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.

作者信息

机构信息

出版信息

BACKGROUND AND OBJECTIVE

METHODS AND RESULTS

CONCLUSION

背景与目的

方法与结果

结论

相似文献

利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.

作者信息

机构信息

出版信息

BACKGROUND AND OBJECTIVE

METHODS AND RESULTS

CONCLUSION

背景与目的

方法与结果

结论

相似文献