从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

作者信息

Arzideh Kamyar, Schäfer Henning, Allende-Cid Héctor, Baldini Giulia, Hilser Thomas, Idrissi-Yaghir Ahmad, Laue Katharina, Chakraborty Nilesh, Doll Niclas, Antweiler Dario, Klug Katrin, Beck Niklas, Giesselbach Sven, Friedrich Christoph M, Nensa Felix, Schuler Martin, Hosch René

机构信息

Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany; Central IT Department, Data Integration Center, University Hospital Essen, Essen, Germany.

Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany.

出版信息

Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.

DOI:10.1016/j.compbiomed.2025.110665

PMID:40554973

Abstract

BACKGROUND

Extracting clinical entities from unstructured medical documents is critical for improving clinical decision support and documentation workflows. This study examines the performance of various encoder and decoder models trained for Named Entity Recognition (NER) of clinical parameters in pathology and radiology reports, highlighting the applicability of Large Language Models (LLMs) for this task.

METHODS

Three NER methods were evaluated: (1) flat NER using transformer-based models, (2) nested NER with a multi-task learning setup, and (3) instruction-based NER utilizing LLMs. A dataset of 2013 pathology reports and 413 radiology reports, annotated by medical students, was used for training and testing.

RESULTS

The performance of encoder-based NER models (flat and nested) was superior to that of LLM-based approaches. The best-performing flat NER models achieved F1-scores of 0.87-0.88 on pathology reports and up to 0.78 on radiology reports, while nested NER models performed slightly lower. In contrast, multiple LLMs, despite achieving high precision, yielded significantly lower F1-scores (ranging from 0.18 to 0.30) due to poor recall. A contributing factor appears to be that these LLMs produce fewer but more accurate entities, suggesting they become overly conservative when generating outputs.

CONCLUSION

LLMs in their current form are unsuitable for comprehensive entity extraction tasks in clinical domains, particularly when faced with a high number of entity types per document, though instructing them to return more entities in subsequent refinements may improve recall. Additionally, their computational overhead does not provide proportional performance gains. Encoder-based NER models, particularly those pre-trained on biomedical data, remain the preferred choice for extracting information from unstructured medical documents.

摘要

背景

从非结构化医疗文档中提取临床实体对于改善临床决策支持和文档工作流程至关重要。本研究考察了针对病理和放射学报告中的临床参数进行命名实体识别（NER）训练的各种编码器和解码器模型的性能，突出了大语言模型（LLM）在此任务中的适用性。

方法

评估了三种NER方法：（1）使用基于Transformer的模型的扁平NER，（2）具有多任务学习设置的嵌套NER，以及（3）利用LLM的基于指令的NER。使用由医学生注释的包含2013份病理报告和413份放射学报告的数据集进行训练和测试。

结果

基于编码器的NER模型（扁平式和嵌套式）的性能优于基于LLM的方法。表现最佳的扁平NER模型在病理报告上的F1分数达到0.87 - 0.88，在放射学报告上高达0.78，而嵌套NER模型的表现略低。相比之下，多个LLM尽管精度较高，但由于召回率低，F1分数显著更低（范围从0.18到0.30）。一个促成因素似乎是这些LLM生成的实体数量较少但更准确，这表明它们在生成输出时变得过于保守。

结论

当前形式的LLM不适用于临床领域的全面实体提取任务，特别是当每份文档面临大量实体类型时，尽管在后续优化中指示它们返回更多实体可能会提高召回率。此外，它们的计算开销并未带来成比例的性能提升。基于编码器的NER模型，特别是那些在生物医学数据上预训练的模型，仍然是从非结构化医疗文档中提取信息的首选。

相似文献

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

Knowledge Graph-Enhanced Deep Learning Model (H-SYSTEM) for Hypertensive Intracerebral Hemorrhage: Model Development and Validation.

J Med Internet Res. 2025 Jun 12;27:e66055. doi: 10.2196/66055.

Academic case reports lack diversity: Assessing the presence and diversity of sociodemographic and behavioral factors related to Post COVID-19 Condition.

PLoS One. 2025 Jul 2;20(7):e0326668. doi: 10.1371/journal.pone.0326668. eCollection 2025.

Large Language Model Architectures in Health Care: Scoping Review of Research Perspectives.

J Med Internet Res. 2025 Jun 19;27:e70315. doi: 10.2196/70315.

Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop.

Methods Inf Med. 2024 Oct 29. doi: 10.1055/a-2405-2489.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Discontinuous named entities in clinical text: A systematic literature review.

J Biomed Inform. 2025 Feb;162:104783. doi: 10.1016/j.jbi.2025.104783. Epub 2025 Jan 23.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.

JMIR AI. 2025 May 30;4:e67369. doi: 10.2196/67369.

引用本文的文献

Classifying Adverse Events from SOAP Notes and Sensor Features in a Clinical Trial of Older Adults.

medRxiv. 2025 Aug 24:2025.08.20.25334088. doi: 10.1101/2025.08.20.25334088.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

作者信息

机构信息

Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany; Central IT Department, Data Integration Center, University Hospital Essen, Essen, Germany.

Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany.

出版信息

Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.

DOI:10.1016/j.compbiomed.2025.110665

PMID:40554973

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSION

摘要

从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献