Suppr超能文献

利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.

作者信息

Hurtado Lluís-F, Marco-Ruiz Luis, Segarra Encarna, Castro-Bleda Maria Jose, Bustos-Moreno Aurelia, Iglesia-Vayá Maria de la, Vallalta-Rueda Juan Francisco

机构信息

VRAIN: Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València, Camí de Vera s/n, València, 46020, Spain; ValgrAI: Valencian Graduate School and Research Network of Artificial Intelligence, Camí de Vera s/n, València, 46020, Spain.

Norwegian Centre for E-health Research, University Hospital of North Norway, P.O. Box 35, Tromsø, N-9038, Norway.

出版信息

Comput Methods Programs Biomed. 2025 Mar;260:108567. doi: 10.1016/j.cmpb.2024.108567. Epub 2025 Jan 3.

Abstract

BACKGROUND AND OBJECTIVE

Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.

METHODS AND RESULTS

Transformers-based NLP Models, concretely pre-trained RoBERTa language models, were used to process radiology reports and annotate them identifying elements matching UMLS Concept Unique Identifiers (CUIs) definitions. CUIs were mapped into several biomedical ontologies useful for phenotyping (e.g., SNOMED-CT, HPO, ICD-10, FMA, LOINC, and ICPC2, among others) and represented as a lightweight ontology using OWL (Web Ontology Language) constructs. This process resulted in a Linked Knowledge Base (LKB), which allows running expressive queries to retrieve reports that comply with specific criteria using automatic reasoning.

CONCLUSION

Although phenotyping tools mostly rely on relational databases, the combination of NLP and Linked Data technologies allows us to build scalable knowledge bases using standard ontologies from the Web of data. Our approach enables us to execute a pipeline which input is free text and automatically maps identified entities to a LKB that allows answering phenotyping queries. In this work, we have only used Spanish radiology reports, although it is extensible to other languages for which suitable corpora are available. This is particularly valuable in regional and national systems dealing with large research databases from different registries and cohorts and plays an essential role in the scalability of large data reuse infrastructures that require indexing and governing distributed data sources.

摘要

背景与目的

尽管在电子健康记录(EHR)的规范化和标准化方面投入巨大,但临床记录中自由文本仍是常态而非例外。自由文本的使用对用于支持临床研究的数据复用方法有影响,因为队列定义和患者匹配中使用的查询机制主要基于结构化数据和临床术语。本研究旨在开发一种临床文本二次使用的方法,具体包括:(a)使用自然语言处理(NLP)为临床记录标记生物医学术语;(b)设计一种本体,将所有识别出的标签映射并分类到各种术语中,并允许运行表型查询。

方法与结果

基于Transformer的NLP模型,具体为预训练的RoBERTa语言模型,用于处理放射学报告并对其进行注释,识别与统一医学语言系统(UMLS)概念唯一标识符(CUI)定义匹配的元素。CUI被映射到几个对表型分析有用的生物医学本体(例如,医学系统命名法 - 临床术语[SNOMED - CT]、人类表型本体[HPO]、国际疾病分类第十版[ICD - 10]、解剖学基础模型[FMA]、逻辑观察标识符名称和代码[LOINC]以及国际初级保健分类第二版[ICPC2]等),并使用OWL(网络本体语言)构建表示为轻量级本体。这一过程产生了一个链接知识库(LKB),它允许运行表达性查询,以使用自动推理检索符合特定标准的报告。

结论

尽管表型分析工具大多依赖关系数据库,但NLP和链接数据技术的结合使我们能够使用来自数据网络的标准本体构建可扩展的知识库。我们的方法使我们能够执行一个管道,其输入是自由文本,并自动将识别出的实体映射到一个允许回答表型查询的LKB。在这项工作中,我们仅使用了西班牙语放射学报告,尽管它可扩展到有合适语料库的其他语言。这在处理来自不同登记处和队列的大型研究数据库的区域和国家系统中特别有价值,并且在需要对分布式数据源进行索引和管理的大数据复用基础设施的可扩展性方面发挥着重要作用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验