使用来自Transformer的双向编码器表示对头部计算机断层扫描报告进行分类时主动学习算法的比较。

Comparison of active learning algorithms in classifying head computed tomography reports using bidirectional encoder representations from transformers.

作者信息

Wataya Tomohiro, Miura Azusa, Sakisuka Takahisa, Fujiwara Masahiro, Tanaka Hisashi, Hiraoka Yu, Sato Junya, Tomiyama Miyuki, Nishigaki Daiki, Kita Kosuke, Suzuki Yuki, Kido Shoji, Tomiyama Noriyuki

机构信息

Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan.

Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan.

出版信息

Int J Comput Assist Radiol Surg. 2025 Apr;20(4):687-701. doi: 10.1007/s11548-024-03316-7. Epub 2025 Jan 8.

DOI:10.1007/s11548-024-03316-7

PMID:39777700

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12034600/

Abstract

PURPOSE

Systems equipped with natural language (NLP) processing can reduce missed radiological findings by physicians, but the annotation costs are burden in the development. This study aimed to compare the effects of active learning (AL) algorithms in NLP for estimating the significance of head computed tomography (CT) reports using bidirectional encoder representations from transformers (BERT).

METHODS

A total of 3728 head CT reports annotated with five categories of importance were used and UTH-BERT was adopted as the pre-trained BERT model. We assumed that 64% (2385 reports) of the data were initially in the unlabeled data pool (UDP), while the labeled data set (LD) used to train the model was empty. Twenty-five reports were repeatedly selected from the UDP and added to the LD, based on seven metrices: random sampling (RS: control), four uncertainty sampling (US) methods (least confidence (LC), margin sampling (MS), ratio of confidence (RC), and entropy sampling (ES)), and two distance-based sampling (DS) methods (cosine distance (CD) and Euclidian distance (ED)). The transition of accuracy of the model was evaluated using the test dataset.

RESULTS

The accuracy of the models with US was significantly higher than RS when reports in LD were < 1800, whereas DS methods were significantly lower than RS. Among the US methods, MS and RC were even better than the others. With the US methods, the required labeled data decreased by 15.4-40.5%, and most efficient in RC. In addition, in the US methods, data for minor categories tended to be added to LD earlier than RS and DS.

CONCLUSIONS

In the classification task for the importance of head CT reports, US methods, especially RC and MS can lead to the effective fine-tuning of BERT models and reduce the imbalance of categories. AL can contribute to other studies on larger datasets by providing effective annotation.

摘要

目的

配备自然语言处理（NLP）功能的系统可以减少医生遗漏的放射学检查结果，但注释成本是开发过程中的负担。本研究旨在比较主动学习（AL）算法在NLP中使用来自变换器的双向编码器表示（BERT）来评估头部计算机断层扫描（CT）报告重要性的效果。

方法

共使用了3728份标注了五类重要性的头部CT报告，并采用UTH-BERT作为预训练的BERT模型。我们假设64%（2385份报告）的数据最初在未标注数据集（UDP）中，而用于训练模型的标注数据集（LD）为空。基于七种指标，从UDP中反复选择25份报告并添加到LD中：随机抽样（RS：对照）、四种不确定性抽样（US）方法（最小置信度（LC）、边际抽样（MS）、置信度比（RC）和熵抽样（ES））以及两种基于距离的抽样（DS）方法（余弦距离（CD）和欧几里得距离（ED））。使用测试数据集评估模型准确性的变化。

结果

当LD中的报告数量<1800时，采用US方法的模型准确性显著高于RS，而DS方法显著低于RS。在US方法中，MS和RC甚至比其他方法更好。采用US方法时，所需的标注数据减少了15.4 - 40.5%，其中RC最有效。此外，在US方法中，小类别的数据往往比RS和DS更早地添加到LD中。

结论

在头部CT报告重要性的分类任务中，US方法，尤其是RC和MS可以有效微调BERT模型并减少类别不平衡。主动学习可以通过提供有效的注释为其他关于更大数据集的研究做出贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef82/12034600/4626657dd6b5/11548_2024_3316_Fig1_HTML.jpg

相似文献

Comparison of active learning algorithms in classifying head computed tomography reports using bidirectional encoder representations from transformers.使用来自Transformer的双向编码器表示对头部计算机断层扫描报告进行分类时主动学习算法的比较。

Int J Comput Assist Radiol Surg. 2025 Apr;20(4):687-701. doi: 10.1007/s11548-024-03316-7. Epub 2025 Jan 8.

Comparison of natural language processing algorithms in assessing the importance of head computed tomography reports written in Japanese.比较自然语言处理算法在评估日文版头部计算机断层扫描报告重要性方面的表现。

Jpn J Radiol. 2024 Jul;42(7):697-708. doi: 10.1007/s11604-024-01549-9. Epub 2024 Mar 29.

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究

Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.

TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.TECRR：一个基于机器学习、深度学习和大语言模型基线的用于 BI-RADS 分类的放射学报告基准数据集。

BMC Med Inform Decis Mak. 2024 Oct 24;24(1):310. doi: 10.1186/s12911-024-02717-7.

Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers.使用来自 Transformer 的双向编码器表示自动检测可操作的放射学报告。

BMC Med Inform Decis Mak. 2021 Sep 11;21(1):262. doi: 10.1186/s12911-021-01623-6.

Information extraction from weakly structured radiological reports with natural language queries.利用自然语言查询从弱结构放射学报告中提取信息。

Eur Radiol. 2024 Jan;34(1):330-337. doi: 10.1007/s00330-023-09977-3. Epub 2023 Jul 28.

Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT（来自 Transformers 的双向编码器表示）的深度学习方法在提取中文放射学报告证据中的应用：计算机辅助肝癌诊断框架的开发。

J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.

A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告（CancerBERT 网络）中提取数据的问答系统：开发研究。

J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

本文引用的文献

Jpn J Radiol. 2024 Jul;42(7):697-708. doi: 10.1007/s11604-024-01549-9. Epub 2024 Mar 29.

A clinical specific BERT developed using a huge Japanese clinical text corpus.一个使用大型日本临床文本语料库开发的临床专用 BERT。

PLoS One. 2021 Nov 9;16(11):e0259763. doi: 10.1371/journal.pone.0259763. eCollection 2021.

Attention-Based Deep Entropy Active Learning Using Lexical Algorithm for Mental Health Treatment.基于注意力的深度熵主动学习：使用词汇算法进行心理健康治疗

Front Psychol. 2021 Mar 30;12:642347. doi: 10.3389/fpsyg.2021.642347. eCollection 2021.

Deep active learning for classifying cancer pathology reports.深度学习在癌症病理报告分类中的应用。

BMC Bioinformatics. 2021 Mar 9;22(1):113. doi: 10.1186/s12859-021-04047-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用来自Transformer的双向编码器表示对头部计算机断层扫描报告进行分类时主动学习算法的比较。

Comparison of active learning algorithms in classifying head computed tomography reports using bidirectional encoder representations from transformers.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献