Department of Computer Science and Engineering and IT, Shiraz University, Shiraz, Iran.
Shiraz University of Medical Sciences, Shiraz, Iran.
J Digit Imaging. 2023 Feb;36(1):80-90. doi: 10.1007/s10278-022-00692-x. Epub 2022 Aug 24.
Since radiology reports needed for clinical practice and research are written and stored in free-text narrations, extraction of relative information for further analysis is difficult. In these circumstances, natural language processing (NLP) techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. In recent years, deep learning (DL)-based models have been adapted for NLP experiments with promising results. Despite the significant potential of DL models based on artificial neural networks (ANN) and convolutional neural networks (CNN), the models face some limitations to implement in clinical practice. Transformers, another new DL architecture, have been increasingly applied to improve the process. Therefore, in this study, we propose a transformer-based fine-grained named entity recognition (NER) architecture for clinical information extraction. We collected 88 abdominopelvic sonography reports in free-text formats and annotated them based on our developed information schema. The text-to-text transfer transformer model (T5) and Scifive, a pre-trained domain-specific adaptation of the T5 model, were applied for fine-tuning to extract entities and relations and transform the input into a structured format. Our transformer-based model in this study outperformed previously applied approaches such as ANN and CNN models based on ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores of 0.816, 0.668, 0.528, and 0.743, respectively, while providing an interpretable structured report.
由于临床实践和研究所需的放射学报告是以自由文本叙述的形式编写和存储的,因此难以提取相关信息进行进一步分析。在这种情况下,自然语言处理 (NLP) 技术可以方便地自动提取信息,并将自由文本格式转换为结构化数据。近年来,基于深度学习 (DL) 的模型已被应用于 NLP 实验,并取得了有希望的结果。尽管基于人工神经网络 (ANN) 和卷积神经网络 (CNN) 的 DL 模型具有很大的潜力,但这些模型在实际应用中仍面临一些局限性。另一种新的深度学习架构——转换器,已被越来越多地应用于改进这一过程。因此,在本研究中,我们提出了一种基于转换器的细粒度命名实体识别 (NER) 架构,用于从临床信息中提取实体。我们收集了 88 份以自由文本格式编写的腹部和盆腔超声报告,并根据我们开发的信息模式对其进行了标注。我们使用了文本到文本转移转换器模型 (T5) 和 Scifive(T5 模型的预训练特定领域自适应模型)进行微调,以提取实体和关系,并将输入转换为结构化格式。我们的研究中的基于转换器的模型在 ROUGE-1、ROUGE-2、ROUGE-L 和 BLEU 分数分别为 0.816、0.668、0.528 和 0.743 的情况下,优于以前应用的方法,如 ANN 和 CNN 模型,同时提供了可解释的结构化报告。