• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用带有提示工程的命名实体识别从非英语乳腺钼靶报告中自动提取关键实体

Automated Extraction of Key Entities from Non-English Mammography Reports Using Named Entity Recognition with Prompt Engineering.

作者信息

Akcali Zafer, Cubuk Hazal Selvi, Oguz Arzu, Kocak Murat, Farzaliyeva Aydan, Guven Fatih, Ramazanoglu Mehmet Nezir, Hasdemir Efe, Altundag Ozden, Agildere Ahmet Muhtesem

机构信息

Department of Medical Informatics, Faculty of Medicine, Baskent University, Ankara 06790, Türkiye.

Division of Medical Oncology, Department of Internal Medicine, Faculty of Medicine, Baskent University, Ankara 06790, Türkiye.

出版信息

Bioengineering (Basel). 2025 Feb 10;12(2):168. doi: 10.3390/bioengineering12020168.

DOI:10.3390/bioengineering12020168
PMID:40001686
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11852152/
Abstract

OBJECTIVE

Named entity recognition (NER) offers a powerful method for automatically extracting key clinical information from text, but current models often lack sufficient support for non-English languages.

MATERIALS AND METHODS

This study investigated a prompt-based NER approach using Google's Gemini 1.5 Pro, a large language model (LLM) with a 1.5-million-token context window. We focused on extracting important clinical entities from Turkish mammography reports, a language with limited available natural language processing (NLP) tools. Our method employed many-shot learning, incorporating 165 examples within a 26,000-token prompt derived from 75 initial reports. We tested the model on a separate set of 85 unannotated reports, concentrating on five key entities: anatomy (ANAT), impression (IMP), observation presence (OBS-P), absence (OBS-A), and uncertainty (OBS-U).

RESULTS

Our approach achieved high accuracy, with a macro-averaged F1 score of 0.99 for relaxed match and 0.84 for exact match. In relaxed matching, the model achieved F1 scores of 0.99 for ANAT, 0.99 for IMP, 1.00 for OBS-P, 1.00 for OBS-A, and 0.99 for OBS-U. For exact match, the F1 scores were 0.88 for ANAT, 0.79 for IMP, 0.78 for OBS-P, 0.94 for OBS-A, and 0.82 for OBS-U.

DISCUSSION

These results indicate that a many-shot prompt engineering approach with large language models provides an effective way to automate clinical information extraction for languages where NLP resources are less developed, and as reported in the literature, generally outperforms zero-shot, five-shot, and other few-shot methods.

CONCLUSION

This approach has the potential to significantly improve clinical workflows and research efforts in multilingual healthcare environments.

摘要

目的

命名实体识别(NER)为从文本中自动提取关键临床信息提供了一种强大的方法,但当前模型通常对非英语语言缺乏足够的支持。

材料与方法

本研究调查了一种基于提示的NER方法,该方法使用谷歌的Gemini 1.5 Pro,这是一个具有150万个标记上下文窗口的大语言模型(LLM)。我们专注于从土耳其语乳房X光检查报告中提取重要的临床实体,土耳其语可用的自然语言处理(NLP)工具有限。我们的方法采用了多示例学习,在从75份初始报告中得出的26000个标记的提示中纳入了165个示例。我们在另一组85份未注释的报告上测试了该模型,重点关注五个关键实体:解剖结构(ANAT)、印象(IMP)、观察存在(OBS-P)、观察缺失(OBS-A)和不确定性(OBS-U)。

结果

我们的方法取得了很高的准确率,宽松匹配的宏平均F1分数为0.99,精确匹配的为0.84。在宽松匹配中,模型对ANAT的F1分数为0.99,对IMP为0.99,对OBS-P为1.00,对OBS-A为1.00,对OBS-U为0.99。对于精确匹配,ANAT的F1分数为0.88,IMP为0.79,OBS-P为0.78,OBS-A为0.94,OBS-U为0.82。

讨论

这些结果表明,使用大语言模型的多示例提示工程方法为NLP资源欠发达的语言实现临床信息提取自动化提供了一种有效途径,并且如文献报道,通常优于零示例、五示例和其他少示例方法。

结论

这种方法有可能显著改善多语言医疗环境中的临床工作流程和研究工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/072431b48743/bioengineering-12-00168-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/d7b20a7947de/bioengineering-12-00168-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/fe9cf181279d/bioengineering-12-00168-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/261ec810759a/bioengineering-12-00168-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/e25b208223e3/bioengineering-12-00168-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/3d7cea731014/bioengineering-12-00168-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/072431b48743/bioengineering-12-00168-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/d7b20a7947de/bioengineering-12-00168-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/fe9cf181279d/bioengineering-12-00168-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/261ec810759a/bioengineering-12-00168-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/e25b208223e3/bioengineering-12-00168-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/3d7cea731014/bioengineering-12-00168-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4c9/11852152/072431b48743/bioengineering-12-00168-g006.jpg

相似文献

1
Automated Extraction of Key Entities from Non-English Mammography Reports Using Named Entity Recognition with Prompt Engineering.使用带有提示工程的命名实体识别从非英语乳腺钼靶报告中自动提取关键实体
Bioengineering (Basel). 2025 Feb 10;12(2):168. doi: 10.3390/bioengineering12020168.
2
Prompt Framework for Extracting Scale-Related Knowledge Entities from Chinese Medical Literature: Development and Evaluation Study.从中医文献中提取量表相关知识实体的提示框架:开发与评估研究
J Med Internet Res. 2025 Mar 18;27:e67033. doi: 10.2196/67033.
3
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
4
Improving large language models for clinical named entity recognition via prompt engineering.通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
5
From zero to hero: Harnessing transformers for biomedical named entity recognition in zero- and few-shot contexts.从零到英雄:利用变压器在零样本和少样本上下文中进行生物医学命名实体识别。
Artif Intell Med. 2024 Oct;156:102970. doi: 10.1016/j.artmed.2024.102970. Epub 2024 Aug 24.
6
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
7
Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.放射学报告的自动匿名化:公开可用的自然语言处理与大语言模型的比较
Eur Radiol. 2025 May;35(5):2634-2641. doi: 10.1007/s00330-024-11148-x. Epub 2024 Oct 31.
8
A novel Data and Model Centric artificial intelligence based approach in developing high-performance Named Entity Recognition for Bengali Language.一种基于数据和模型为中心的人工智能方法,用于开发高性能的孟加拉语命名实体识别。
PLoS One. 2023 Sep 22;18(9):e0287818. doi: 10.1371/journal.pone.0287818. eCollection 2023.
9
Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria.阿尔茨海默病纳入标准的命名实体识别与规范化
Proc (IEEE Int Conf Healthc Inform). 2023 Jun;2023:558-564. doi: 10.1109/ichi57859.2023.00100. Epub 2023 Dec 11.
10
RT: a Retrieving and Chain-of-Thought framework for few-shot medical named entity recognition.RT:一种用于少样本医学命名实体识别的检索和思维链框架。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1929-1938. doi: 10.1093/jamia/ocae095.

本文引用的文献

1
Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition.中文医学命名实体识别中大型语言模型的比较分析
Bioengineering (Basel). 2024 Sep 29;11(10):982. doi: 10.3390/bioengineering11100982.
2
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.
3
Applications of Large Language Models in Pathology.大语言模型在病理学中的应用。
Bioengineering (Basel). 2024 Mar 31;11(4):342. doi: 10.3390/bioengineering11040342.
4
Improving large language models for clinical named entity recognition via prompt engineering.通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
5
GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment.GERNERMED++:通过迁移学习、翻译和词对齐实现德语医学自然语言处理中的语义标注。
J Biomed Inform. 2023 Nov;147:104513. doi: 10.1016/j.jbi.2023.104513. Epub 2023 Oct 13.
6
Development of an Open-Source Annotated Glaucoma Medication Dataset From Clinical Notes in the Electronic Health Record.从电子健康记录中的临床记录中开发开源标注青光眼药物数据集。
Transl Vis Sci Technol. 2022 Nov 1;11(11):20. doi: 10.1167/tvst.11.11.20.
7
Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.荷兰语自由文本放射学报告中的自然语言处理:小语种地区肺部肿瘤分期面临的挑战
J Digit Imaging. 2020 Aug;33(4):1002-1008. doi: 10.1007/s10278-020-00327-z.
8
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
9
Patient Privacy in the Era of Big Data.大数据时代的患者隐私
Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.