• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用Transformer模型从肺癌筛查患者的放射学报告中提取肺结节及结节特征

Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models.

作者信息

Yang Shuang, Yang Xi, Lyu Tianchen, Huang James L, Chen Aokun, He Xing, Braithwaite Dejana, Mehta Hiren J, Wu Yonghui, Guo Yi, Bian Jiang

机构信息

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA.

Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL USA.

出版信息

J Healthc Inform Res. 2024 May 17;8(3):463-477. doi: 10.1007/s41666-024-00166-5. eCollection 2024 Sep.

DOI:10.1007/s41666-024-00166-5
PMID:39131104
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11310180/
Abstract

UNLABELLED

Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information from radiology reports into structured data elements. This study aimed to develop an NLP system using state-of-the-art transformer models to extract pulmonary nodules and associated nodule characteristics from radiology reports. We identified a cohort of 3080 patients who underwent LDCT at the University of Florida health system and collected their radiology reports. We manually annotated 394 reports as the gold standard. We explored eight pretrained transformer models from three transformer architectures including bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT), for clinical concept extraction, relation identification, and negation detection. We examined general transformer models pretrained using general English corpora, transformer models fine-tuned using a clinical corpus, and a large clinical transformer model, GatorTron, which was trained from scratch using 90 billion words of clinical text. We compared transformer models with two baseline models including a recurrent neural network implemented using bidirectional long short-term memory with a conditional random fields layer and support vector machines. RoBERTa-mimic achieved the best 1-score of 0.9279 for nodule concept and nodule characteristics extraction. ALBERT-base and GatorTron achieved the best 1-score of 0.9737 in linking nodule characteristics to pulmonary nodules. Seven out of eight transformers achieved the best 1-score of 1.0000 for negation detection. Our end-to-end system achieved an overall 1-score of 0.8869. This study demonstrated the advantage of state-of-the-art transformer models for pulmonary nodule information extraction from radiology reports.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s41666-024-00166-5.

摘要

未标注

肺结节及结节特征是肺结节恶性程度的重要指标。然而,在电子健康记录系统中的放射学报告等临床叙述中,结节信息通常以自由文本形式记录。自然语言处理(NLP)是从放射学报告中提取并将患者信息标准化为结构化数据元素的关键技术。本研究旨在开发一个使用最先进的Transformer模型的NLP系统,以从放射学报告中提取肺结节及相关结节特征。我们确定了佛罗里达大学健康系统中3080名接受低剂量计算机断层扫描(LDCT)的患者队列,并收集了他们的放射学报告。我们手动注释了394份报告作为金标准。我们从包括双向编码器表征来自Transformer(BERT)、稳健优化的BERT方法(RoBERTa)和轻量级BERT(ALBERT)的三种Transformer架构中探索了八个预训练的Transformer模型,用于临床概念提取、关系识别和否定检测。我们研究了使用通用英语语料库预训练的通用Transformer模型、使用临床语料库微调的Transformer模型以及一个大型临床Transformer模型GatorTron,后者使用900亿字的临床文本从头开始训练。我们将Transformer模型与两个基线模型进行比较,包括使用带有条件随机场层的双向长短期记忆实现的循环神经网络和支持向量机。RoBERTa - mimic在结节概念和结节特征提取方面取得了最佳的F1分数0.9279。ALBERT - base和GatorTron在将结节特征与肺结节关联方面取得了最佳的F1分数0.9737。八个Transformer模型中有七个在否定检测方面取得了最佳的F1分数1.0000。我们的端到端系统总体F1分数为0.8869。本研究证明了最先进的Transformer模型在从放射学报告中提取肺结节信息方面的优势。

补充信息

在线版本包含可在10.1007/s41666 - 024 - 00166 - 5获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa47/11310180/74cbf46060ec/41666_2024_166_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa47/11310180/74cbf46060ec/41666_2024_166_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa47/11310180/74cbf46060ec/41666_2024_166_Fig1_HTML.jpg

相似文献

1
Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models.使用Transformer模型从肺癌筛查患者的放射学报告中提取肺结节及结节特征
J Healthc Inform Res. 2024 May 17;8(3):463-477. doi: 10.1007/s41666-024-00166-5. eCollection 2024 Sep.
2
Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods.使用基于转换器的自然语言处理方法识别与糖尿病视网膜病变相关的临床概念及其属性。
BMC Med Inform Decis Mak. 2022 Sep 27;22(Suppl 3):255. doi: 10.1186/s12911-022-01996-2.
3
A Preliminary Study of Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports Using Natural Language Processing.利用自然语言处理从放射学报告中提取肺结节及结节特征的初步研究
Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:618-619. doi: 10.1109/ichi54592.2022.00125. Epub 2022 Sep 8.
4
Extracting Thyroid Nodules Characteristics from Ultrasound Reports Using Transformer-based Natural Language Processing Methods.基于 Transformer 的自然语言处理方法从超声报告中提取甲状腺结节特征。
AMIA Annu Symp Proc. 2024 Jan 11;2023:1193-1200. eCollection 2023.
5
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.
6
Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.用于放射学报告中自动重要发现标记和提取的否定与推测检测的深度学习方法:内部验证与技术比较研究
JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348.
7
RadBERT: Adapting Transformer-based Language Models to Radiology.RadBERT:使基于Transformer的语言模型适用于放射学领域。
Radiol Artif Intell. 2022 Jun 15;4(4):e210258. doi: 10.1148/ryai.210258. eCollection 2022 Jul.
8
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
9
Contextualized medication information extraction using Transformer-based deep learning architectures.基于 Transformer 的深度学习架构的上下文药物信息提取。
J Biomed Inform. 2023 Jun;142:104370. doi: 10.1016/j.jbi.2023.104370. Epub 2023 Apr 24.
10
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT(来自 Transformers 的双向编码器表示)的深度学习方法在提取中文放射学报告证据中的应用:计算机辅助肝癌诊断框架的开发。
J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.

引用本文的文献

1
Patient and nodule characteristics associated with adherence to lung cancer screening in a large integrated healthcare system.在一个大型综合医疗保健系统中,与肺癌筛查依从性相关的患者和结节特征。
Sci Rep. 2025 Aug 9;15(1):29172. doi: 10.1038/s41598-025-15053-1.
2
Oxidative Stress and Inflammation in Hypoxemic Respiratory Diseases and Their Comorbidities: Molecular Insights and Diagnostic Advances in Chronic Obstructive Pulmonary Disease and Sleep Apnea.低氧性呼吸系统疾病及其合并症中的氧化应激与炎症:慢性阻塞性肺疾病和睡眠呼吸暂停的分子见解与诊断进展
Antioxidants (Basel). 2025 Jul 8;14(7):839. doi: 10.3390/antiox14070839.
3

本文引用的文献

1
Clinical concept and relation extraction using prompt-based machine reading comprehension.基于提示的机器阅读理解的临床概念和关系抽取。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1486-1493. doi: 10.1093/jamia/ocad107.
2
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
3
A Fusion NLP Model for the Inference of Standardized Thyroid Nodule Malignancy Scores from Radiology Report Text.基于融合自然语言处理的甲状腺结节良恶性评分模型从放射学报告文本中推断。
Enhancing lung cancer detection through integrated deep learning and transformer models.
通过集成深度学习和Transformer模型提高肺癌检测能力
Sci Rep. 2025 May 4;15(1):15614. doi: 10.1038/s41598-025-00516-2.
4
Mapping the Advanced-Stage Epithelial Ovarian Cancer Landscape Goes Beyond Words: Two Large Language Models, Eight Tasks, One Journey.绘制晚期上皮性卵巢癌全景远非文字所能描述:两个大语言模型,八项任务,一段征程。
J Clin Med. 2025 Mar 25;14(7):2223. doi: 10.3390/jcm14072223.
AMIA Annu Symp Proc. 2022 Feb 21;2021:1079-1088. eCollection 2021.
4
Natural Language Processing to Identify Pulmonary Nodules and Extract Nodule Characteristics From Radiology Reports.自然语言处理技术在放射学报告中识别肺结节并提取结节特征的应用。
Chest. 2021 Nov;160(5):1902-1914. doi: 10.1016/j.chest.2021.05.048. Epub 2021 Jun 4.
5
Screening for Lung Cancer With Low-Dose Computed Tomography: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force.用低剂量计算机断层扫描进行肺癌筛查:美国预防服务工作组的更新证据报告和系统评价。
JAMA. 2021 Mar 9;325(10):971-987. doi: 10.1001/jama.2021.0377.
6
Cancer Statistics, 2021.癌症统计数据,2021.
CA Cancer J Clin. 2021 Jan;71(1):7-33. doi: 10.3322/caac.21654. Epub 2021 Jan 12.
7
Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models.从临床叙述中提取患者家族病史:使用深度学习模型探索端到端解决方案
JMIR Med Inform. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982.
8
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.
9
Integrity of clinical information in radiology reports documenting pulmonary nodules.放射学报告中肺结节临床信息的完整性。
J Am Med Inform Assoc. 2021 Jan 15;28(1):80-85. doi: 10.1093/jamia/ocaa209.
10
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.