Suppr超能文献

使用轻量级开源大语言模型进行临床笔记章节分类的动态少样本提示

Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models.

作者信息

Miller Kurt, Bedrick Steven, Lu Qiuhao, Wen Andrew, Hersh William, Roberts Kirk, Liu Hongfang

机构信息

Bioinformatics and Computational Biology Program, University of Minnesota, Rochester, MN, United States.

Center for Digital Health, Mayo Clinic, Rochester, MN, United States.

出版信息

J Am Med Inform Assoc. 2025 Jul 1;32(7):1164-1173. doi: 10.1093/jamia/ocaf084.

Abstract

OBJECTIVE

Unlocking clinical information embedded in clinical notes has been hindered to a significant degree by domain-specific and context-sensitive language. Identification of note sections and structural document elements has been shown to improve information extraction and dependent downstream clinical natural language processing (NLP) tasks and applications. This study investigates the viability of a dynamic example selection prompting method to section classification using lightweight, open-source large language models (LLMs) as a practical solution for real-world healthcare clinical NLP systems.

MATERIALS AND METHODS

We develop a dynamic few-shot prompting approach to classifying sections where section samples are first embedded using a transformer-based model and deposited in a vector store. During inference, the embedded samples with the most similar contextual embeddings to a given input section text are retrieved from the vector store and inserted into the LLM prompt. We evaluate this technique on two datasets comprising two section schemas, including varying levels of context. We compare the performance to baseline zero-shot and randomly selected few-shot scenarios.

RESULTS

The dynamic few-shot prompting experiments yielded the highest F1 scores in each of the classification tasks and datasets for all seven of the LLMs included in the evaluation, averaging a macro F1 increase of 39.3% and 21.1% in our primary section classification task over the zero-shot and static few-shot baselines, respectively.

DISCUSSION AND CONCLUSION

Our results showcase substantial performance improvements imparted by dynamically selecting examples for few-shot LLM prompting, and further improvement by including section context, demonstrating compelling potential for clinical applications.

摘要

目的

特定领域和上下文敏感的语言在很大程度上阻碍了从临床记录中提取临床信息。已证明识别记录部分和结构化文档元素可改善信息提取以及相关的下游临床自然语言处理(NLP)任务和应用。本研究调查了一种动态示例选择提示方法用于使用轻量级、开源大语言模型(LLM)进行部分分类的可行性,作为现实世界医疗保健临床NLP系统的一种实用解决方案。

材料与方法

我们开发了一种动态少样本提示方法来对部分进行分类,其中部分样本首先使用基于Transformer的模型进行嵌入,并存储在向量库中。在推理过程中,从向量库中检索与给定输入部分文本上下文嵌入最相似的嵌入样本,并插入到LLM提示中。我们在包含两种部分模式的两个数据集上评估该技术,包括不同程度的上下文。我们将性能与基线零样本和随机选择的少样本场景进行比较。

结果

在评估中包含的所有七个LLM的每个分类任务和数据集中,动态少样本提示实验都产生了最高的F1分数,在我们的主要部分分类任务中,与零样本和静态少样本基线相比,宏观F1平均分别提高了39.3%和21.1%。

讨论与结论

我们的结果表明,通过为少样本LLM提示动态选择示例可显著提高性能,并且通过纳入部分上下文可进一步提高性能,这显示了在临床应用中的巨大潜力。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验