Suppr超能文献

大语言模型在临床记录校正中的应用:关于各种再训练方法的综合研究

Application of large language models in clinical record correction: a comprehensive study on various retraining methods.

作者信息

Maitin Ana M, Nogales Alberto, Fernández-Rincón Sergio, Aranguren Enrique, Cervera-Barba Emilio, Denizon-Arranz Sophia, Mateos-Rodríguez Alonso, García-Tejedor Álvaro J

机构信息

CEIEC, Universidad Francisco de Vitoria, Pozuelo de Alarcón, 28223 Madrid, Spain.

Facultad de Medicina, Universidad Francisco de Vitoria, Pozuelo de Alarcón, 28223 Madrid, Spain.

出版信息

J Am Med Inform Assoc. 2025 Feb 1;32(2):341-348. doi: 10.1093/jamia/ocae302.

Abstract

OBJECTIVES

We evaluate the effectiveness of large language models (LLMs), specifically GPT-based (GPT-3.5 and GPT-4) and Llama-2 models (13B and 7B architectures), in autonomously assessing clinical records (CRs) to enhance medical education and diagnostic skills.

MATERIALS AND METHODS

Various techniques, including prompt engineering, fine-tuning (FT), and low-rank adaptation (LoRA), were implemented and compared on Llama-2 7B. These methods were assessed using prompts in both English and Spanish to determine their adaptability to different languages. Performance was benchmarked against GPT-3.5, GPT-4, and Llama-2 13B.

RESULTS

GPT-based models, particularly GPT-4, demonstrated promising performance closely aligned with specialist evaluations. Application of FT on Llama-2 7B improved text comprehension in Spanish, equating its performance to that of Llama-2 13B with English prompts. Low-rank adaptation significantly enhanced performance, surpassing GPT-3.5 results when combined with FT. This indicates LoRA's effectiveness in adapting open-source models for specific tasks.

DISCUSSION

While GPT-4 showed superior performance, FT and LoRA on Llama-2 7B proved crucial in improving language comprehension and task-specific accuracy. Identified limitations highlight the need for further research.

CONCLUSION

This study underscores the potential of LLMs in medical education, providing an innovative, effective approach to CR correction. Low-rank adaptation emerged as the most effective technique, enabling open-source models to perform on par with proprietary models. Future research should focus on overcoming current limitations to further improve model performance.

摘要

目的

我们评估大语言模型(LLMs),特别是基于GPT的模型(GPT - 3.5和GPT - 4)以及Llama - 2模型(13B和7B架构)在自主评估临床记录(CRs)以提升医学教育和诊断技能方面的有效性。

材料与方法

在Llama - 2 7B上实施并比较了各种技术,包括提示工程、微调(FT)和低秩自适应(LoRA)。使用英语和西班牙语的提示对这些方法进行评估,以确定它们对不同语言的适应性。将性能与GPT - 3.5、GPT - 4和Llama - 2 13B进行基准测试。

结果

基于GPT的模型,特别是GPT - 4,表现出与专家评估密切一致的良好性能。在Llama - 2 7B上应用FT提高了西班牙语的文本理解能力,使其在使用西班牙语提示时的性能与使用英语提示的Llama - 2 13B相当。低秩自适应显著提高了性能,与FT结合时超过了GPT - 3.5的结果。这表明LoRA在使开源模型适应特定任务方面的有效性。

讨论

虽然GPT - 4表现出卓越的性能,但Llama - 2 7B上的FT和LoRA在提高语言理解和特定任务准确性方面被证明至关重要。已确定的局限性凸显了进一步研究的必要性。

结论

本研究强调了大语言模型在医学教育中的潜力,提供了一种创新、有效的临床记录校正方法。低秩自适应成为最有效的技术,使开源模型能够与专有模型表现相当。未来的研究应专注于克服当前的局限性,以进一步提高模型性能。

相似文献

本文引用的文献

3
The application of large language models in medicine: A scoping review.大语言模型在医学中的应用:一项范围综述。
iScience. 2024 Apr 23;27(5):109713. doi: 10.1016/j.isci.2024.109713. eCollection 2024 May 17.
8
[Origin and development of the book Medical Semiology].《医学符号学》这本书的起源与发展
Rev Med Chil. 2018 Mar;146(3):387-390. doi: 10.4067/s0034-98872018000300387.
9
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验