Akcali Zafer, Cubuk Hazal Selvi, Oguz Arzu, Kocak Murat, Farzaliyeva Aydan, Guven Fatih, Ramazanoglu Mehmet Nezir, Hasdemir Efe, Altundag Ozden, Agildere Ahmet Muhtesem
Department of Medical Informatics, Faculty of Medicine, Baskent University, Ankara 06790, Türkiye.
Division of Medical Oncology, Department of Internal Medicine, Faculty of Medicine, Baskent University, Ankara 06790, Türkiye.
Bioengineering (Basel). 2025 Feb 10;12(2):168. doi: 10.3390/bioengineering12020168.
Named entity recognition (NER) offers a powerful method for automatically extracting key clinical information from text, but current models often lack sufficient support for non-English languages.
This study investigated a prompt-based NER approach using Google's Gemini 1.5 Pro, a large language model (LLM) with a 1.5-million-token context window. We focused on extracting important clinical entities from Turkish mammography reports, a language with limited available natural language processing (NLP) tools. Our method employed many-shot learning, incorporating 165 examples within a 26,000-token prompt derived from 75 initial reports. We tested the model on a separate set of 85 unannotated reports, concentrating on five key entities: anatomy (ANAT), impression (IMP), observation presence (OBS-P), absence (OBS-A), and uncertainty (OBS-U).
Our approach achieved high accuracy, with a macro-averaged F1 score of 0.99 for relaxed match and 0.84 for exact match. In relaxed matching, the model achieved F1 scores of 0.99 for ANAT, 0.99 for IMP, 1.00 for OBS-P, 1.00 for OBS-A, and 0.99 for OBS-U. For exact match, the F1 scores were 0.88 for ANAT, 0.79 for IMP, 0.78 for OBS-P, 0.94 for OBS-A, and 0.82 for OBS-U.
These results indicate that a many-shot prompt engineering approach with large language models provides an effective way to automate clinical information extraction for languages where NLP resources are less developed, and as reported in the literature, generally outperforms zero-shot, five-shot, and other few-shot methods.
This approach has the potential to significantly improve clinical workflows and research efforts in multilingual healthcare environments.
命名实体识别(NER)为从文本中自动提取关键临床信息提供了一种强大的方法,但当前模型通常对非英语语言缺乏足够的支持。
本研究调查了一种基于提示的NER方法,该方法使用谷歌的Gemini 1.5 Pro,这是一个具有150万个标记上下文窗口的大语言模型(LLM)。我们专注于从土耳其语乳房X光检查报告中提取重要的临床实体,土耳其语可用的自然语言处理(NLP)工具有限。我们的方法采用了多示例学习,在从75份初始报告中得出的26000个标记的提示中纳入了165个示例。我们在另一组85份未注释的报告上测试了该模型,重点关注五个关键实体:解剖结构(ANAT)、印象(IMP)、观察存在(OBS-P)、观察缺失(OBS-A)和不确定性(OBS-U)。
我们的方法取得了很高的准确率,宽松匹配的宏平均F1分数为0.99,精确匹配的为0.84。在宽松匹配中,模型对ANAT的F1分数为0.99,对IMP为0.99,对OBS-P为1.00,对OBS-A为1.00,对OBS-U为0.99。对于精确匹配,ANAT的F1分数为0.88,IMP为0.79,OBS-P为0.78,OBS-A为0.94,OBS-U为0.82。
这些结果表明,使用大语言模型的多示例提示工程方法为NLP资源欠发达的语言实现临床信息提取自动化提供了一种有效途径,并且如文献报道,通常优于零示例、五示例和其他少示例方法。
这种方法有可能显著改善多语言医疗环境中的临床工作流程和研究工作。