文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).

机构信息

Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China.

School of Medical Technology, Beijing Institute of Technology, No.5 Zhongguancun East Road, Beijing, 100050, People's Republic of China.

出版信息

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.


DOI:10.1186/s12911-022-01946-y
PMID:35907966
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9338483/
Abstract

BACKGROUND: Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited. OBJECTIVE: The aim of this study is to propose a novel approach in classifying actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer BERT-based models and evaluate the benefits of in domain pre-training (IDPT) along with a sequence adaptation strategy. METHODS: A total of 5864 temporal bone computed tomography(CT) reports are labeled by two experienced radiologists as follows: (1) normal findings without notable lesions; (2) notable lesions but uncorrelated to tinnitus; and (3) at least one lesion considered as potential cause of tinnitus. We then constructed a framework consisting of deep learning (DL) neural networks and self-supervised BERT models. A tinnitus domain-specific corpus is used to pre-train the BERT model to further improve its embedding weights. In addition, we conducted an experiment to evaluate multiple groups of max sequence length settings in BERT to reduce the excessive quantity of calculations. After a comprehensive comparison of all metrics, we determined the most promising approach through the performance comparison of F1-scores and AUC values. RESULTS: In the first experiment, the BERT finetune model achieved a more promising result (AUC-0.868, F1-0.760) compared with that of the Word2Vec-based models(AUC-0.767, F1-0.733) on validation data. In the second experiment, the BERT in-domain pre-training model (AUC-0.948, F1-0.841) performed significantly better than the BERT based model(AUC-0.868, F1-0.760). Additionally, in the variants of BERT fine-tuning models, Mengzi achieved the highest AUC of 0.878 (F1-0.764). Finally, we found that the BERT max-sequence-length of 128 tokens achieved an AUC of 0.866 (F1-0.736), which is almost equal to the BERT max-sequence-length of 512 tokens (AUC-0.868,F1-0.760). CONCLUSION: In conclusion, we developed a reliable BERT-based framework for tinnitus diagnosis from Chinese radiology reports, along with a sequence adaptation strategy to reduce computational resources while maintaining accuracy. The findings could provide a reference for NLP development in Chinese radiology reports.

摘要

背景:随着越来越多的人患有耳鸣,对有明确治疗方案的报告进行准确分类有助于辅助临床决策。然而,这一过程需要经验丰富的医生和大量的人力。自然语言处理(NLP)在医学文本的大数据分析中显示出巨大的潜力,但它在放射学报告的特定领域分析中的应用有限。

目的:本研究旨在提出一种新方法,使用基于双向编码器表示的转换器 BERT 模型对耳鸣患者的可操作放射学报告进行分类,并评估域内预训练(IDPT)和序列适配策略的益处。

方法:总共 5864 份颞骨计算机断层扫描(CT)报告由两位经验丰富的放射科医生标记如下:(1)无明显病变的正常发现;(2)有明显病变但与耳鸣无关;(3)至少有一处病变被认为是耳鸣的潜在原因。然后,我们构建了一个由深度学习(DL)神经网络和自监督 BERT 模型组成的框架。使用耳鸣领域特定语料库对 BERT 模型进行预训练,以进一步提高其嵌入权重。此外,我们还进行了一项实验,以评估 BERT 中多个最大序列长度设置,以减少计算量过大的问题。在对所有指标进行全面比较后,我们通过比较 F1 分数和 AUC 值来确定最有前途的方法。

结果:在第一个实验中,BERT 微调模型在验证数据上的表现优于基于 Word2Vec 的模型(AUC-0.767,F1-0.733)(AUC-0.868,F1-0.760)。在第二个实验中,BERT 领域内预训练模型(AUC-0.948,F1-0.841)的表现明显优于 BERT 模型(AUC-0.868,F1-0.760)。此外,在 BERT 微调模型的变体中,Mengzi 实现了最高 AUC 为 0.878(F1-0.764)。最后,我们发现 BERT 的最大序列长度为 128 个标记符,其 AUC 为 0.866(F1-0.736),几乎与 BERT 的最大序列长度为 512 个标记符(AUC-0.868,F1-0.760)相同。

结论:总之,我们开发了一种可靠的基于 BERT 的耳鸣诊断框架,结合序列适配策略,可以在保持准确性的同时减少计算资源。研究结果可为中文放射学报告的 NLP 开发提供参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/7efc559f41dc/12911_2022_1946_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/86950ce57635/12911_2022_1946_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/ca78db63bd33/12911_2022_1946_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/0080354dd21d/12911_2022_1946_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/343b050f0af4/12911_2022_1946_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/f5c59bf17630/12911_2022_1946_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/9de1fca9db10/12911_2022_1946_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/7d853a4a65b9/12911_2022_1946_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/0c2fb260363b/12911_2022_1946_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/3d1c54cce516/12911_2022_1946_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/7efc559f41dc/12911_2022_1946_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/86950ce57635/12911_2022_1946_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/ca78db63bd33/12911_2022_1946_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/0080354dd21d/12911_2022_1946_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/343b050f0af4/12911_2022_1946_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/f5c59bf17630/12911_2022_1946_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/9de1fca9db10/12911_2022_1946_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/7d853a4a65b9/12911_2022_1946_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/0c2fb260363b/12911_2022_1946_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/3d1c54cce516/12911_2022_1946_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0da/9338483/7efc559f41dc/12911_2022_1946_Fig10_HTML.jpg

相似文献

[1]
Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).

BMC Med Inform Decis Mak. 2022-7-30

[2]
Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers.

BMC Med Inform Decis Mak. 2021-9-11

[3]
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.

J Med Internet Res. 2021-1-12

[4]
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.

BMC Med Res Methodol. 2022-7-2

[5]
Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.

JMIR Med Inform. 2023-4-25

[6]
RadBERT: Adapting Transformer-based Language Models to Radiology.

Radiol Artif Intell. 2022-6-15

[7]
Information extraction from weakly structured radiological reports with natural language queries.

Eur Radiol. 2024-1

[8]
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

J Biomed Inform. 2022-3

[9]
Development and External Validation of an Artificial Intelligence Model for Identifying Radiology Reports Containing Recommendations for Additional Imaging.

AJR Am J Roentgenol. 2023-9

[10]
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.

Sovrem Tekhnologii Med. 2024

引用本文的文献

[1]
Applications of Natural Language Processing in Otolaryngology: A Scoping Review.

Laryngoscope. 2025-9

[2]
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.

Sovrem Tekhnologii Med. 2024

[3]
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.

Eur Radiol. 2024-6

[4]
Applications of Artificial Intelligence in Temporal Bone Imaging: Advances and Future Challenges.

Cureus. 2023-9-2

[5]
An interpretable deep learning framework for predicting liver metastases in postoperative colorectal cancer patients using natural language processing and clinical data integration.

Cancer Med. 2023-9

[6]
Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer.

Front Oncol. 2022-11-21

本文引用的文献

[1]
Automatic Diagnosis Labeling of Cardiovascular MRI by Using Semisupervised Natural Language Processing of Text Reports.

Radiol Artif Intell. 2021-11-24

[2]
A Comparison of Natural Language Processing Methods for the Classification of Lumbar Spine Imaging Findings Related to Lower Back Pain.

Acad Radiol. 2022-3

[3]
Basic Artificial Intelligence Techniques: Natural Language Processing of Radiology Reports.

Radiol Clin North Am. 2021-11

[4]
Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers.

BMC Med Inform Decis Mak. 2021-9-11

[5]
Practical Guide to Natural Language Processing for Radiology.

Radiographics. 2021

[6]
Qualifying Certainty in Radiology Reports through Deep Learning-Based Natural Language Processing.

AJNR Am J Neuroradiol. 2021-10

[7]
BERT for the Processing of Radiological Reports: An Attention-based Natural Language Processing Algorithm.

Acad Radiol. 2022-4

[8]
A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.

BMC Med Inform Decis Mak. 2021-7-30

[9]
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.

J Med Internet Res. 2021-1-12

[10]
Domain specific word embeddings for natural language processing in radiology.

J Biomed Inform. 2021-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索