定制大语言模型在胃肠病学中的潜在临床应用：一项初步研究。

The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study.

作者信息

Gong Eun Jeong, Bang Chang Seok, Lee Jae Jun, Park Jonghyung, Kim Eunsil, Kim Subeen, Kimm Minjae, Choi Seoung-Ho

机构信息

Department of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Republic of Korea.

Institute for Liver and Digestive Diseases, Hallym University, Chuncheon 24253, Republic of Korea.

出版信息

Bioengineering (Basel). 2024 Dec 24;12(1):1. doi: 10.3390/bioengineering12010001.

DOI:10.3390/bioengineering12010001

PMID:39851275

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11760845/

Abstract

The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). We established a customized GPT with the BM25 algorithm using Open AI's GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM's performance improved, answering two additional knowledge-based questions correctly, matching the fellow's score. However, judgment-based questions remained a challenge for the model. Even with the implementation of 'Chain of Thought' prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist's level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice.

摘要

大语言模型（LLM）有应用于临床实践的潜力。然而，在胃肠病学领域对此的研究却很少。目的：本研究探讨两种大语言模型在胃肠病学领域的潜在临床效用：一种定制的GPT模型和传统的GPT-4o，后者是一种能够进行检索增强生成（RAG）的先进大语言模型。我们使用OpenAI的GPT-4o模型和BM25算法建立了一个定制的GPT，使其能够在特定文档（包括英文的内科教科书和韩文的胃肠病学教科书）的背景下生成回答。此外，我们准备了对传统ChatGPT 4o（于2024年10月16日访问）的访问权限。基准测试（用韩文编写）由四位临床专家提出的15个临床问题组成，代表了医学生的典型问题。对这两种大语言模型、一名胃肠病学住院医师和一名胃肠病学专家进行了测试，以评估他们的表现。定制的大语言模型正确回答了15个问题中的8个，而住院医师正确回答了10个。当将标准化的韩文医学术语替换为英文术语时，大语言模型的表现有所提高，又正确回答了两个基于知识的问题，与住院医师的分数持平。然而，基于判断的问题对该模型来说仍然是一个挑战。即使实施了“思维链”提示工程，定制的GPT也没有实现推理能力的提升。传统的GPT-4o在人工智能模型中得分最高（14/15）。虽然两个模型的表现都略低于胃肠病学专家的水平（15/15），但它们在临床应用方面显示出了有前景的潜力（分数与胃肠病学住院医师相当或更高）。大语言模型可用于协助诸如患者咨询等专业任务。然而，通过实时检索训练数据集中未包含的外部数据的RAG能力，对于管理复杂的专业内容似乎至关重要，并且临床医生的监督对于确保在临床实践中的安全有效使用仍然至关重要。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

定制大语言模型在胃肠病学中的潜在临床应用：一项初步研究。

The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

定制大语言模型在胃肠病学中的潜在临床应用：一项初步研究。

The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献