Department of Internal Medicine (Digestive Diseases), Yale School of Medicine, New Haven, Connecticut, USA.
Department of Medical, Surgical, and Health Sciences, University of Trieste, Trieste, Italy.
Liver Int. 2024 Sep;44(9):2114-2124. doi: 10.1111/liv.15974. Epub 2024 May 31.
Large Language Models (LLMs) are transformer-based neural networks with billions of parameters trained on very large text corpora from diverse sources. LLMs have the potential to improve healthcare due to their capability to parse complex concepts and generate context-based responses. The interest in LLMs has not spared digestive disease academics, who have mainly investigated foundational LLM accuracy, which ranges from 25% to 90% and is influenced by the lack of standardized rules to report methodologies and results for LLM-oriented research. In addition, a critical issue is the absence of a universally accepted definition of accuracy, varying from binary to scalar interpretations, often tied to grader expertise without reference to clinical guidelines. We address strategies and challenges to increase accuracy. In particular, LLMs can be infused with domain knowledge using Retrieval Augmented Generation (RAG) or Supervised Fine-Tuning (SFT) with reinforcement learning from human feedback (RLHF). RAG faces challenges with in-context window limits and accurate information retrieval from the provided context. SFT, a deeper adaptation method, is computationally demanding and requires specialized knowledge. LLMs may increase patient quality of care across the field of digestive diseases, where physicians are often engaged in screening, treatment and surveillance for a broad range of pathologies for which in-context learning or SFT with RLHF could improve clinical decision-making and patient outcomes. However, despite their potential, the safe deployment of LLMs in healthcare still needs to overcome hurdles in accuracy, suggesting a need for strategies that integrate human feedback with advanced model training.
大型语言模型(LLMs)是基于转换器的神经网络,在来自不同来源的非常大的文本语料库上进行了数十亿个参数的训练。由于其解析复杂概念和生成基于上下文的响应的能力,LLMs 有可能改善医疗保健。LLMs 引起了消化病学学者的兴趣,他们主要研究基础 LLM 的准确性,其范围从 25%到 90%,并且受到缺乏标准化规则来报告面向 LLM 的研究方法和结果的影响。此外,一个关键问题是缺乏普遍接受的准确性定义,从二进制到标量解释不等,通常与评分员的专业知识有关,而没有参考临床指南。我们解决了提高准确性的策略和挑战。特别是,可以使用检索增强生成(RAG)或基于人类反馈的监督微调(SFT)和强化学习将领域知识注入到 LLM 中。RAG 面临上下文窗口限制和从提供的上下文中准确检索信息的挑战。SFT 是一种更深层次的适应方法,计算要求高,需要专门的知识。LLMs 可以提高整个消化疾病领域的患者护理质量,在该领域,医生通常参与广泛病理的筛查、治疗和监测,对于这些病理,上下文学习或具有 RLHF 的 SFT 可以改善临床决策和患者预后。然而,尽管它们具有潜力,但在医疗保健中安全部署 LLM 仍然需要克服准确性方面的障碍,这表明需要将人类反馈与先进的模型训练相结合的策略。