Suppr超能文献

模型调优还是提示调优?大型语言模型在临床概念和关系抽取中的应用研究。

Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction.

机构信息

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

出版信息

J Biomed Inform. 2024 May;153:104630. doi: 10.1016/j.jbi.2024.104630. Epub 2024 Mar 26.

Abstract

OBJECTIVE

To develop soft prompt-based learning architecture for large language models (LLMs), examine prompt-tuning using frozen/unfrozen LLMs, and assess their abilities in transfer learning and few-shot learning.

METHODS

We developed a soft prompt-based learning architecture and compared 4 strategies including (1) fine-tuning without prompts; (2) hard-prompting with unfrozen LLMs; (3) soft-prompting with unfrozen LLMs; and (4) soft-prompting with frozen LLMs. We evaluated GatorTron, a clinical LLM with up to 8.9 billion parameters, and compared GatorTron with 4 existing transformer models for clinical concept and relation extraction on 2 benchmark datasets for adverse drug events and social determinants of health (SDoH). We evaluated the few-shot learning ability and generalizability for cross-institution applications.

RESULTS AND CONCLUSION

When LLMs are unfrozen, GatorTron-3.9B with soft prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept extraction, outperforming the traditional fine-tuning and hard prompt-based models by 0.6 ∼ 3.1 % and 1.2 ∼ 2.9 %, respectively; GatorTron-345 M with soft prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end relation extraction, outperforming other two models by 0.2 ∼ 2 % and 0.6 ∼ 11.7 %, respectively. When LLMs are frozen, small LLMs have a big gap to be competitive with unfrozen models; scaling LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen models. Soft prompting with a frozen GatorTron-8.9B model achieved the best performance for cross-institution evaluation. We demonstrate that (1) machines can learn soft prompts better than hard prompts composed by human, (2) frozen LLMs have good few-shot learning ability and generalizability for cross-institution applications, (3) frozen LLMs reduce computing cost to 2.5 ∼ 6 % of previous methods using unfrozen LLMs, and (4) frozen LLMs require large models (e.g., over several billions of parameters) for good performance.

摘要

目的

开发基于软提示的大型语言模型 (LLM) 学习架构,研究冻结/未冻结 LLM 下的提示调整,并评估其在迁移学习和少样本学习中的能力。

方法

我们开发了一种基于软提示的学习架构,并比较了 4 种策略,包括(1)不使用提示的微调;(2)使用未冻结的 LLM 的硬提示;(3)使用未冻结的 LLM 的软提示;和(4)使用冻结的 LLM 的软提示。我们评估了 GatorTron,一个拥有多达 89 亿个参数的临床 LLM,并将其与 4 个现有的临床概念和关系提取的转换器模型在 2 个药物不良事件和社会决定因素健康(SDoH)基准数据集上进行比较。我们评估了跨机构应用的少样本学习能力和泛化能力。

结果与结论

当 LLM 未冻结时,带有软提示的 GatorTron-3.9B 实现了概念提取的最佳严格 F1 得分为 0.9118 和 0.8604,分别比传统的微调和硬提示模型高出 0.6%至 3.1%和 1.2%至 2.9%;带有软提示的 GatorTron-345M 实现了端到端关系提取的最佳 F1 得分为 0.8332 和 0.7488,分别比其他两个模型高出 0.2%至 2%和 0.6%至 11.7%。当 LLM 被冻结时,较小的 LLM 与未冻结的模型相比差距较大,难以竞争;将 LLM 扩展到数十亿个参数可以使冻结的 LLM 与未冻结的模型竞争。使用冻结的 GatorTron-8.9B 模型进行软提示,可以在跨机构评估中获得最佳性能。研究表明:(1)机器可以比人类更好地学习软提示;(2)冻结的 LLM 具有良好的少样本学习能力和跨机构应用的泛化能力;(3)冻结的 LLM 将计算成本降低到使用未冻结的 LLM 的 2.5%至 6%;(4)冻结的 LLM 需要大型模型(例如,数十亿个以上的参数)才能获得良好的性能。

相似文献

10
Clinical Prompt Learning With Frozen Language Models.临床提示学习与冻结语言模型。
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16453-16463. doi: 10.1109/TNNLS.2023.3294633. Epub 2024 Oct 29.

引用本文的文献

1
Enhancing biomedical relation extraction with directionality.通过方向性增强生物医学关系提取
Bioinformatics. 2025 Jul 1;41(Supplement_1):i68-i76. doi: 10.1093/bioinformatics/btaf226.

本文引用的文献

3
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
4
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验