Suppr超能文献

RAMIE:基于大语言模型的膳食补充剂检索增强多任务信息提取

RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.

作者信息

Zhan Zaifu, Zhou Shuang, Li Mingchen, Zhang Rui

机构信息

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, United States.

Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States.

出版信息

J Am Med Inform Assoc. 2025 Mar 1;32(3):545-554. doi: 10.1093/jamia/ocaf002.

Abstract

OBJECTIVE

To develop an advanced multi-task large language model (LLM) framework for extracting diverse types of information about dietary supplements (DSs) from clinical records.

METHODS

We focused on 4 core DS information extraction tasks: named entity recognition (2 949 clinical sentences), relation extraction (4 892 sentences), triple extraction (2 949 sentences), and usage classification (2 460 sentences). To address these tasks, we introduced the retrieval-augmented multi-task information extraction (RAMIE) framework, which incorporates: (1) instruction fine-tuning with task-specific prompts; (2) multi-task training of LLMs to enhance storage efficiency and reduce training costs; and (3) retrieval-augmented generation, which retrieves similar examples from the training set to improve task performance. We compared the performance of RAMIE to LLMs with instruction fine-tuning alone and conducted an ablation study to evaluate the individual contributions of multi-task learning and retrieval-augmented generation to overall performance improvements.

RESULTS

Using the RAMIE framework, Llama2-13B achieved an F1 score of 87.39 on the named entity recognition task, reflecting a 3.51% improvement. It also excelled in the relation extraction task with an F1 score of 93.74, a 1.15% improvement. For the triple extraction task, Llama2-7B achieved an F1 score of 79.45, representing a significant 14.26% improvement. MedAlpaca-7B delivered the highest F1 score of 93.45 on the usage classification task, with a 0.94% improvement. The ablation study highlighted that while multi-task learning improved efficiency with a minor trade-off in performance, the inclusion of retrieval-augmented generation significantly enhanced overall accuracy across tasks.

CONCLUSION

The RAMIE framework demonstrates substantial improvements in multi-task information extraction for DS-related data from clinical records.

摘要

目的

开发一种先进的多任务大语言模型(LLM)框架,用于从临床记录中提取关于膳食补充剂(DSs)的多种类型信息。

方法

我们专注于4项核心的DS信息提取任务:命名实体识别(2949个临床句子)、关系提取(4892个句子)、三元组提取(2949个句子)和用法分类(2460个句子)。为解决这些任务,我们引入了检索增强多任务信息提取(RAMIE)框架,该框架包含:(1)使用特定任务提示进行指令微调;(2)对LLM进行多任务训练以提高存储效率并降低训练成本;(3)检索增强生成,即从训练集中检索相似示例以提高任务性能。我们将RAMIE与仅进行指令微调的LLM的性能进行了比较,并进行了消融研究,以评估多任务学习和检索增强生成对整体性能提升的个体贡献。

结果

使用RAMIE框架,Llama2 - 13B在命名实体识别任务上的F1分数达到87.39,提高了3.51%。它在关系提取任务中也表现出色,F1分数为93.74,提高了1.15%。对于三元组提取任务,Llama2 - 7B的F1分数为79.45,显著提高了14.26%。MedAlpaca - 7B在用法分类任务上的F1分数最高,为93.45,提高了0.94%。消融研究强调,虽然多任务学习在性能上有轻微权衡的情况下提高了效率,但包含检索增强生成显著提高了跨任务的整体准确性。

结论

RAMIE框架在从临床记录中提取与DS相关数据的多任务信息提取方面显示出显著改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e253/11833482/3e66e7bdb732/ocaf002f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验