RadBERT：使基于Transformer的语言模型适用于放射学领域。

RadBERT: Adapting Transformer-based Language Models to Radiology.

作者信息

Yan An, McAuley Julian, Lu Xing, Du Jiang, Chang Eric Y, Gentili Amilcare, Hsu Chun-Nan

机构信息

University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093-0608 (A.Y., J.M., X.L., J.D., E.Y.C., A.G., C.N.H.); and Veterans Affairs San Diego Healthcare System, San Diego, Calif (E.Y.C., A.G.).

出版信息

Radiol Artif Intell. 2022 Jun 15;4(4):e210258. doi: 10.1148/ryai.210258. eCollection 2022 Jul.

DOI:10.1148/ryai.210258

PMID:35923376

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9344353/

Abstract

PURPOSE

To investigate if tailoring a transformer-based language model to radiology is beneficial for radiology natural language processing (NLP) applications.

MATERIALS AND METHODS

This retrospective study presents a family of bidirectional encoder representations from transformers (BERT)-based language models adapted for radiology, named RadBERT. Transformers were pretrained with either 2.16 or 4.42 million radiology reports from U.S. Department of Veterans Affairs health care systems nationwide on top of four different initializations (BERT-base, Clinical-BERT, robustly optimized BERT pretraining approach [RoBERTa], and BioMed-RoBERTa) to create six variants of RadBERT. Each variant was fine-tuned for three representative NLP tasks in radiology: abnormal sentence classification: models classified sentences in radiology reports as reporting abnormal or normal findings; report coding: models assigned a diagnostic code to a given radiology report for five coding systems; and report summarization: given the findings section of a radiology report, models selected key sentences that summarized the findings. Model performance was compared by bootstrap resampling with five intensively studied transformer language models as baselines: BERT-base, BioBERT, Clinical-BERT, BlueBERT, and BioMed-RoBERTa.

RESULTS

For abnormal sentence classification, all models performed well (accuracies above 97.5 and F1 scores above 95.0). RadBERT variants achieved significantly higher scores than corresponding baselines when given only 10% or less of 12 458 annotated training sentences. For report coding, all variants outperformed baselines significantly for all five coding systems. The variant RadBERT-BioMed-RoBERTa performed the best among all models for report summarization, achieving a Recall-Oriented Understudy for Gisting Evaluation-1 score of 16.18 compared with 15.27 by the corresponding baseline (BioMed-RoBERTa, < .004).

CONCLUSION

Transformer-based language models tailored to radiology had improved performance of radiology NLP tasks compared with baseline transformer language models. Translation, Unsupervised Learning, Transfer Learning, Neural Networks, Informatics © RSNA, 2022See also commentary by Wiggins and Tejani in this issue.

摘要

目的

研究将基于Transformer的语言模型定制用于放射学是否有利于放射学自然语言处理（NLP）应用。

材料与方法

这项回顾性研究展示了一系列适用于放射学的基于Transformer的双向编码器表征（BERT）语言模型，名为RadBERT。Transformer在来自美国退伍军人事务部医疗系统的216万或442万份放射学报告上进行预训练，基于四种不同的初始化（BERT-base、Clinical-BERT、稳健优化的BERT预训练方法[RoBERTa]和BioMed-RoBERTa），以创建六种RadBERT变体。每个变体针对放射学中的三个代表性NLP任务进行微调：异常句子分类：模型将放射学报告中的句子分类为报告异常或正常发现；报告编码：模型为给定的放射学报告为五个编码系统分配诊断代码；以及报告摘要：给定放射学报告的发现部分，模型选择总结发现的关键句子。通过自抽样重采样将模型性能与五个经过深入研究的Transformer语言模型作为基线进行比较：BERT-base、BioBERT、Clinical-BERT、BlueBERT和BioMed-RoBERTa。

结果

对于异常句子分类，所有模型表现良好（准确率高于97.5，F1分数高于95.0）。当仅给出12458个带注释训练句子的10%或更少时，RadBERT变体的得分显著高于相应基线。对于报告编码，所有变体在所有五个编码系统上均显著优于基线。RadBERT-BioMed-RoBERTa变体在所有模型中报告摘要表现最佳，召回率导向的摘要评估-1（ROUGE-1）分数达到16.18，而相应基线（BioMed-RoBERTa）为15.27（P <.004）。

结论

与基线Transformer语言模型相比，针对放射学定制的基于Transformer的语言模型在放射学NLP任务中具有更高的性能。翻译、无监督学习、迁移学习、神经网络、信息学 © RSNA，2022另见本期Wiggins和Tejani的评论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9343/9344353/810617e78c45/ryai.210258.va.jpg

相似文献

RadBERT: Adapting Transformer-based Language Models to Radiology.RadBERT：使基于Transformer的语言模型适用于放射学领域。

Radiol Artif Intell. 2022 Jun 15;4(4):e210258. doi: 10.1148/ryai.210258. eCollection 2022 Jul.

Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.用于放射学报告中自动重要发现标记和提取的否定与推测检测的深度学习方法：内部验证与技术比较研究

JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348.

Domain-adapted Large Language Models for Classifying Nuclear Medicine Reports.用于核医学报告分类的领域自适应大语言模型

Radiol Artif Intell. 2023 Sep 27;5(6):e220281. doi: 10.1148/ryai.220281. eCollection 2023 Nov.

Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models.使用Transformer模型从肺癌筛查患者的放射学报告中提取肺结节及结节特征

J Healthc Inform Res. 2024 May 17;8(3):463-477. doi: 10.1007/s41666-024-00166-5. eCollection 2024 Sep.

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN（带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合）模型的医患对话多标签分类：命名实体研究

JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.

Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation.基于字符级令牌的医院信息系统变压器抽取式摘要模型（AlphaBERT）的改进双向编码器表示：开发与性能评估

JMIR Med Inform. 2020 Apr 29;8(4):e17787. doi: 10.2196/17787.

Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习：算法开发与验证研究

JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.

BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports.基于BERT的自由文本放射学报告句子级解剖分类迁移学习

Radiol Artif Intell. 2023 Feb 15;5(2):e220097. doi: 10.1148/ryai.220097. eCollection 2023 Mar.

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets.多个预训练BERT模型在为大型数据集自动执行和加速数据标注方面的性能。

Radiol Artif Intell. 2022 Jun 29;4(4):e220007. doi: 10.1148/ryai.220007. eCollection 2022 Jul.

引用本文的文献

From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine.从大语言模型到多模态人工智能：关于生成式人工智能在医学领域潜力的范围综述

Biomed Eng Lett. 2025 Aug 22;15(5):845-863. doi: 10.1007/s13534-025-00497-1. eCollection 2025 Sep.

Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A Review.用于癌症诊断和预后的知识驱动型机器学习综述

IEEE Trans Autom Sci Eng. 2025;22:10008-10028. doi: 10.1109/tase.2024.3515839. Epub 2024 Dec 18.

Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型：数据集开发与验证研究

JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.

In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.利用大语言模型进行上下文学习：一种改进放射学报告标注的简单有效方法。

Healthc Inform Res. 2025 Jul;31(3):295-309. doi: 10.4258/hir.2025.31.3.295. Epub 2025 Jul 31.

Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性影像检查单开具情况。

Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9.

A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications.对过去十年肿瘤学领域自然语言处理方法的一项调查，重点关注癌症登记应用。

Artif Intell Rev. 2025;58(10):314. doi: 10.1007/s10462-025-11316-5. Epub 2025 Jul 16.

Fine-tuning of language models for automated structuring of medical exam reports to improve patient screening and analysis.对语言模型进行微调，以实现医学检查报告的自动结构化，从而改善患者筛查与分析。

Sci Rep. 2025 Jul 4;15(1):23949. doi: 10.1038/s41598-025-05695-6.

Clinical decision support using pseudo-notes from multiple streams of EHR data.利用来自多源电子健康记录（EHR）数据的伪笔记进行临床决策支持。

NPJ Digit Med. 2025 Jul 2;8(1):394. doi: 10.1038/s41746-025-01777-x.

The DRAGON benchmark for clinical NLP.临床自然语言处理的DRAGON基准测试。

NPJ Digit Med. 2025 May 17;8(1):289. doi: 10.1038/s41746-025-01626-x.

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review.使用大语言模型进行临床文本摘要的科学证据：范围综述

J Med Internet Res. 2025 May 15;27:e68998. doi: 10.2196/68998.

本文引用的文献

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

How user intelligence is improving PubMed.用户智能如何提升PubMed。

Nat Biotechnol. 2018 Oct 1. doi: 10.1038/nbt.4267.

Identifying and characterizing highly similar notes in big clinical note datasets.在大型临床笔记数据集中识别和描述高度相似的笔记。

J Biomed Inform. 2018 Jun;82:63-69. doi: 10.1016/j.jbi.2018.04.009. Epub 2018 Apr 19.

Deep Learning to Classify Radiology Free-Text Reports.深度学习在放射科自由文本报告分类中的应用

Radiology. 2018 Mar;286(3):845-852. doi: 10.1148/radiol.2017171115. Epub 2017 Nov 13.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Preparing a collection of radiology examinations for distribution and retrieval.准备一批用于分发和检索的放射学检查资料。

J Am Med Inform Assoc. 2016 Mar;23(2):304-10. doi: 10.1093/jamia/ocv080. Epub 2015 Jul 1.

Overview of BioCreative II gene mention recognition.生物创意II基因提及识别概述。

Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.

Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化

BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.

PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.生理信号库、生理信号处理工具包和生理信号网络：复杂生理信号新研究资源的组成部分。

Circulation. 2000 Jun 13;101(23):E215-20. doi: 10.1161/01.cir.101.23.e215.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RadBERT：使基于Transformer的语言模型适用于放射学领域。

RadBERT: Adapting Transformer-based Language Models to Radiology.

作者信息

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献