• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

定制大语言模型在胃肠病学中的潜在临床应用:一项初步研究。

The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study.

作者信息

Gong Eun Jeong, Bang Chang Seok, Lee Jae Jun, Park Jonghyung, Kim Eunsil, Kim Subeen, Kimm Minjae, Choi Seoung-Ho

机构信息

Department of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Republic of Korea.

Institute for Liver and Digestive Diseases, Hallym University, Chuncheon 24253, Republic of Korea.

出版信息

Bioengineering (Basel). 2024 Dec 24;12(1):1. doi: 10.3390/bioengineering12010001.

DOI:10.3390/bioengineering12010001
PMID:39851275
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11760845/
Abstract

The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). We established a customized GPT with the BM25 algorithm using Open AI's GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM's performance improved, answering two additional knowledge-based questions correctly, matching the fellow's score. However, judgment-based questions remained a challenge for the model. Even with the implementation of 'Chain of Thought' prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist's level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice.

摘要

大语言模型(LLM)有应用于临床实践的潜力。然而,在胃肠病学领域对此的研究却很少。目的:本研究探讨两种大语言模型在胃肠病学领域的潜在临床效用:一种定制的GPT模型和传统的GPT-4o,后者是一种能够进行检索增强生成(RAG)的先进大语言模型。我们使用OpenAI的GPT-4o模型和BM25算法建立了一个定制的GPT,使其能够在特定文档(包括英文的内科教科书和韩文的胃肠病学教科书)的背景下生成回答。此外,我们准备了对传统ChatGPT 4o(于2024年10月16日访问)的访问权限。基准测试(用韩文编写)由四位临床专家提出的15个临床问题组成,代表了医学生的典型问题。对这两种大语言模型、一名胃肠病学住院医师和一名胃肠病学专家进行了测试,以评估他们的表现。定制的大语言模型正确回答了15个问题中的8个,而住院医师正确回答了10个。当将标准化的韩文医学术语替换为英文术语时,大语言模型的表现有所提高,又正确回答了两个基于知识的问题,与住院医师的分数持平。然而,基于判断的问题对该模型来说仍然是一个挑战。即使实施了“思维链”提示工程,定制的GPT也没有实现推理能力的提升。传统的GPT-4o在人工智能模型中得分最高(14/15)。虽然两个模型的表现都略低于胃肠病学专家的水平(15/15),但它们在临床应用方面显示出了有前景的潜力(分数与胃肠病学住院医师相当或更高)。大语言模型可用于协助诸如患者咨询等专业任务。然而,通过实时检索训练数据集中未包含的外部数据的RAG能力,对于管理复杂的专业内容似乎至关重要,并且临床医生的监督对于确保在临床实践中的安全有效使用仍然至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dad/11760845/eb1f54b2a090/bioengineering-12-00001-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dad/11760845/4c530c602907/bioengineering-12-00001-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dad/11760845/eb1f54b2a090/bioengineering-12-00001-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dad/11760845/4c530c602907/bioengineering-12-00001-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dad/11760845/eb1f54b2a090/bioengineering-12-00001-g002.jpg

相似文献

1
The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study.定制大语言模型在胃肠病学中的潜在临床应用:一项初步研究。
Bioengineering (Basel). 2024 Dec 24;12(1):1. doi: 10.3390/bioengineering12010001.
2
Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.评估先进的大型语言模型在医学知识方面的有效性:使用日本国家医学考试的比较研究。
Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.
3
Advancements in large language model accuracy for answering physical medicine and rehabilitation board review questions.用于回答物理医学与康复委员会复习问题的大语言模型准确性的进展。
PM R. 2025 May 2. doi: 10.1002/pmrj.13386.
4
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
5
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
6
Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.具有检索增强生成功能的大型语言模型NotebookLM在肺癌分期中的应用。
Jpn J Radiol. 2025 Apr;43(4):706-712. doi: 10.1007/s11604-024-01705-1. Epub 2024 Nov 25.
7
Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam.评估人工智能在核心脏病学方面的熟练程度:大型语言模型参加资格考试。
J Nucl Cardiol. 2025 Mar;45:102089. doi: 10.1016/j.nuclcard.2024.102089. Epub 2024 Nov 29.
8
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
9
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量:评估研究
ArXiv. 2024 Jan 23:arXiv:2402.01693v1.
10
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.

引用本文的文献

1
Large language models for clinical decision support in gastroenterology and hepatology.用于胃肠病学和肝病学临床决策支持的大语言模型
Nat Rev Gastroenterol Hepatol. 2025 Aug 22. doi: 10.1038/s41575-025-01108-1.

本文引用的文献

1
Detection of Gastrointestinal Bleeding With Large Language Models to Aid Quality Improvement and Appropriate Reimbursement.利用大语言模型检测胃肠道出血以助力质量改进和合理报销。
Gastroenterology. 2025 Jan;168(1):111-120.e4. doi: 10.1053/j.gastro.2024.09.014. Epub 2024 Sep 18.
2
Digesting Digital Health: A Study of Appropriateness and Readability of ChatGPT-Generated Gastroenterological Information.消化数字健康:ChatGPT 生成的胃肠病学信息适宜性和可读性的研究。
Clin Transl Gastroenterol. 2024 Nov 1;15(11):e00765. doi: 10.14309/ctg.0000000000000765.
3
Revolutionizing gastrointestinal endoscopy: the emerging role of large language models.
变革胃肠内镜检查:大语言模型的新兴作用
Clin Endosc. 2024 Nov;57(6):759-762. doi: 10.5946/ce.2024.039. Epub 2024 Aug 29.
4
Evaluating the role of large language models in inflammatory bowel disease patient information.评估大型语言模型在炎症性肠病患者信息中的作用。
World J Gastroenterol. 2024 Aug 7;30(29):3538-3540. doi: 10.3748/wjg.v30.i29.3538.
5
Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions.利用 ChatGPT 作为科学推理引擎,区分相互冲突的证据,并总结有争议的临床问题中的挑战。
J Am Med Inform Assoc. 2024 Jun 20;31(7):1551-1560. doi: 10.1093/jamia/ocae100.
6
Using a customized GPT to provide guideline-based recommendations for management of pancreatic cystic lesions.使用定制的GPT为胰腺囊性病变的管理提供基于指南的建议。
Endosc Int Open. 2024 Apr 26;12(4):E600-E603. doi: 10.1055/a-2289-9334. eCollection 2024 Apr.
7
Charting new AI education in gastroenterology: Cross-sectional evaluation of ChatGPT and perplexity AI in medical residency exam.绘制胃肠病学新的人工智能教育图表:ChatGPT 和 perplexity AI 在医学住院医师考试中的横断面评估。
Dig Liver Dis. 2024 Aug;56(8):1304-1311. doi: 10.1016/j.dld.2024.02.019. Epub 2024 Mar 19.
8
Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations.在线人工智能平台及其在胃肠外科手术中的适用性。
J Gastrointest Surg. 2024 Jan;28(1):64-69. doi: 10.1016/j.gassur.2023.11.019.
9
Understanding the Landscape: The Emergence of Artificial Intelligence (AI), ChatGPT, and Google Bard in Gastroenterology.了解现状:人工智能(AI)、ChatGPT和谷歌巴德在胃肠病学领域的兴起。
Cureus. 2024 Jan 8;16(1):e51848. doi: 10.7759/cureus.51848. eCollection 2024 Jan.
10
Applicability of Online Chat-Based Artificial Intelligence Models to Colorectal Cancer Screening.基于在线聊天的人工智能模型在结直肠癌筛查中的适用性。
Dig Dis Sci. 2024 Mar;69(3):791-797. doi: 10.1007/s10620-024-08274-3. Epub 2024 Jan 24.