• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从文本到数据:用于从德语病理报告中提取癌症相关医学属性的开源大语言模型

From text to data: Open-source large language models in extracting cancer related medical attributes from German pathology reports.

作者信息

Bartels Stefan, Carus Jasmin

机构信息

University Medical Center Hamburg-Eppendorf / University Cancer Center, Martinistr. 52, Hamburg, 22767, Germany.

出版信息

Int J Med Inform. 2025 Nov;203:106022. doi: 10.1016/j.ijmedinf.2025.106022. Epub 2025 Jul 2.

DOI:10.1016/j.ijmedinf.2025.106022
PMID:40609461
Abstract

Structured oncological documentation is vital for data-driven cancer care, yet extracting clinical features from unstructured pathology reports remains challenging-especially in German healthcare, where strict data protection rules require local model deployment. This study evaluates open-source large language models (LLMs) for extracting oncological attributes from German pathology reports in a secure, on-premise setting. We created a gold-standard dataset of 522 annotated reports and developed a retrieval-augmented generation (RAG) pipeline using an additional 15,000 pathology reports. Five instruction-tuned LLMs (Llama 3.3 70B, Mistral Small 24B, and three SauerkrautLM variants) were evaluated using three prompting strategies: zero-shot, few-shot, and RAG-enhanced few-shot prompting. All models produced structured JSON outputs and were assessed using entity-level precision, recall, accuracy, and macro-averaged F1-score. Results show that Llama 3.3 70B achieved the highest overall performance (F1 > 0.90). However, when combined with the RAG pipeline, Mistral Small 24B achieved nearly equivalent performance, matching Llama 70B on most entity types while requiring significantly fewer computational resources. Prompting strategy significantly impacted performance: few-shot prompting improved baseline accuracy, and RAG further enhanced performance, particularly for models with fewer than 24B parameters. Challenges remained in extracting less frequent but clinically critical attributes like metastasis and staging, underscoring the importance of retrieval mechanisms and balanced training data. This study demonstrates that open-source LLMs, when paired with effective prompting and retrieval strategies, can enable high-quality, privacy-compliant extraction of oncological information from unstructured text. The finding that smaller models can match larger ones through retrieval augmentation highlights a path toward scalable, resource-efficient deployment in German clinical settings.

摘要

结构化肿瘤学文档对于数据驱动的癌症护理至关重要,但从非结构化病理报告中提取临床特征仍然具有挑战性,尤其是在德国医疗保健领域,严格的数据保护规则要求在本地部署模型。本研究评估了开源大语言模型(LLMs)在安全的本地环境中从德国病理报告中提取肿瘤学属性的能力。我们创建了一个包含522份注释报告的黄金标准数据集,并使用另外15000份病理报告开发了一个检索增强生成(RAG)管道。使用三种提示策略对五个经过指令微调的大语言模型(Llama 3.3 70B、Mistral Small 24B和三个SauerkrautLM变体)进行了评估:零样本、少样本和RAG增强少样本提示。所有模型都生成结构化的JSON输出,并使用实体级精度、召回率、准确率和宏平均F1分数进行评估。结果表明,Llama 3.3 70B总体性能最高(F1>0.90)。然而,当与RAG管道结合使用时,Mistral Small 24B实现了几乎相当的性能,在大多数实体类型上与Llama 70B匹配,同时所需的计算资源显著减少。提示策略对性能有显著影响:少样本提示提高了基线准确率,RAG进一步提高了性能,特别是对于参数少于24B的模型。在提取转移和分期等不太常见但临床关键的属性方面仍然存在挑战,这突出了检索机制和平衡训练数据的重要性。本研究表明,开源大语言模型与有效的提示和检索策略相结合,可以从非结构化文本中高质量、符合隐私要求地提取肿瘤学信息。较小的模型可以通过检索增强与较大的模型相匹配这一发现,为德国临床环境中可扩展、资源高效的部署指明了一条道路。

相似文献

1
From text to data: Open-source large language models in extracting cancer related medical attributes from German pathology reports.从文本到数据:用于从德语病理报告中提取癌症相关医学属性的开源大语言模型
Int J Med Inform. 2025 Nov;203:106022. doi: 10.1016/j.ijmedinf.2025.106022. Epub 2025 Jul 2.
2
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
3
Can open source large language models be used for tumor documentation in Germany?-An evaluation on urological doctors' notes.在德国,开源大语言模型可用于肿瘤记录吗?——对泌尿科医生笔记的评估
BioData Min. 2025 Jul 24;18(1):48. doi: 10.1186/s13040-025-00463-8.
4
Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.使用检索增强大语言模型预测术后30天死亡率和美国麻醉医师协会身体状况:开发与验证研究
J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.
5
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.
6
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
7
From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别
Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.
8
Extracting epilepsy-related information from unstructured clinic letters using large language models.使用大语言模型从非结构化临床信件中提取癫痫相关信息。
Epilepsia. 2025 Jul 10. doi: 10.1111/epi.18475.
9
Assessing Retrieval-Augmented Large Language Model Performance in Emergency Department ICD-10-CM Coding Compared to Human Coders.与人工编码员相比,评估检索增强型大语言模型在急诊科ICD-10-CM编码中的性能。
medRxiv. 2024 Oct 17:2024.10.15.24315526. doi: 10.1101/2024.10.15.24315526.
10
Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型:多模型性能评估
BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.