通过生成式预训练变换器-4检验与炎症性肠病相关营养问题回答的准确性和可重复性。

Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by Generative Pre-trained Transformer-4.

作者信息

Samaan Jamil S, Issokson Kelly, Feldman Erin, Fasulo Christina, Rajeev Nithya, Ng Wee Han, Hollander Barbara, Yeo Yee Hui, Vasiliauskas Eric

机构信息

Department of Medicine, Karsh Division of Digestive and Liver Diseases, Cedars-Sinai Medical Center, Los Angeles, CA, USA.

Keck School of Medicine of USC, Los Angeles, CA, USA.

出版信息

Crohns Colitis 360. 2025 Feb 19;7(1):otae077. doi: 10.1093/crocol/otae077. eCollection 2025 Jan.

DOI:10.1093/crocol/otae077

PMID:40078587

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11897593/

Abstract

BACKGROUND

Generative pre-trained transformer-4 (GPT-4) is a large language model (LLM) trained on a vast corpus of data, including the medical literature. Nutrition plays an important role in managing inflammatory bowel disease (IBD), with an unmet need for nutrition-related patient education resources. This study examines the accuracy, comprehensiveness, and reproducibility of responses by GPT-4 to patient nutrition questions related to IBD.

METHODS

Questions were obtained from adult IBD clinic visits, Facebook, and Reddit. Two IBD-focused registered dieticians independently graded the accuracy and reproducibility of GPT-4's responses while a third senior IBD-focused registered dietitian arbitrated. Each question was inputted twice into the model.

RESULTS

88 questions were selected. The model correctly responded to 73/88 questions (83.0%), with 61 (69.0%) graded as comprehensive. 15/88 (17%) responses were graded as mixed with correct and incorrect/outdated data. The model comprehensively responded to 10 (62.5%) questions related to "Nutrition and diet needs for surgery," 12 (92.3%) "Tube feeding and parenteral nutrition," 11 (64.7%) "General diet questions," 10 (50%) "Diet for reducing symptoms/inflammation," and 18 (81.8%) "Micronutrients/supplementation needs." The model provided reproducible responses to 81/88 (92.0%) questions.

CONCLUSIONS

GPT-4 comprehensively answered most questions, demonstrating the promising potential of LLMs as supplementary tools for IBD patients seeking nutrition-related information. However, 17% of responses contained incorrect information, highlighting the need for continuous refinement prior to incorporation into clinical practice. Future studies should emphasize leveraging LLMs to enhance patient outcomes and promoting patient and healthcare professional proficiency in using LLMs to maximize their efficacy.

摘要

背景

生成式预训练变换器4（GPT-4）是一种基于大量数据（包括医学文献）训练的大语言模型（LLM）。营养在炎症性肠病（IBD）的管理中起着重要作用，但对营养相关患者教育资源的需求尚未得到满足。本研究考察了GPT-4对与IBD相关的患者营养问题的回答的准确性、全面性和可重复性。

方法

问题来自成人IBD门诊、脸书和红迪网。两名专注于IBD的注册营养师独立对GPT-4的回答的准确性和可重复性进行评分，第三名资深的专注于IBD的注册营养师进行仲裁。每个问题都输入模型两次。

结果

共选择了88个问题。该模型正确回答了73/88个问题（83.0%），其中61个（69.0%）被评为全面。15/88（17%）的回答被评为正确与不正确/过时数据混合。该模型全面回答了10个（62.5%）与“手术的营养和饮食需求”相关的问题、12个（92.3%）“管饲和肠外营养”问题、11个（64.7%）“一般饮食问题”、10个（50%）“减轻症状/炎症的饮食”以及18个（81.8%）“微量营养素/补充需求”问题。该模型对81/88（92.0%）的问题提供了可重复回答。

结论

GPT-4全面回答了大多数问题，表明大语言模型作为寻求营养相关信息的IBD患者的辅助工具具有广阔的潜力。然而，17%的回答包含错误信息，突出了在纳入临床实践之前持续改进的必要性。未来的研究应强调利用大语言模型改善患者预后，并提高患者和医疗保健专业人员使用大语言模型以最大化其疗效的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efde/11897593/eeabd22bfc3c/otae077_fig2.jpg

相似文献

Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by Generative Pre-trained Transformer-4.通过生成式预训练变换器-4检验与炎症性肠病相关营养问题回答的准确性和可重复性。

Crohns Colitis 360. 2025 Feb 19;7(1):otae077. doi: 10.1093/crocol/otae077. eCollection 2025 Jan.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量：评估研究

ArXiv. 2024 Jan 23:arXiv:2402.01693v1.

Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.使用心身医学考试问题评估 ChatGPT 对布鲁姆教育目标分类法的掌握程度：混合方法研究。

J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.

Appropriateness of ChatGPT in Answering Heart Failure Related Questions.ChatGPT 在回答心力衰竭相关问题方面的适宜性。

Heart Lung Circ. 2024 Sep;33(9):1314-1318. doi: 10.1016/j.hlc.2024.03.005. Epub 2024 May 31.

Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources.大型语言模型和减重手术患者教育：GPT-3.5、GPT-4、Bard 与在线机构资源的可读性比较分析。

Surg Endosc. 2024 May;38(5):2522-2532. doi: 10.1007/s00464-024-10720-2. Epub 2024 Mar 12.

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.对ChatGPT关于淀粉样变性知识的多学科评估：观察性研究。

JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.

Performance of large language models on benign prostatic hyperplasia frequently asked questions.大语言模型在良性前列腺增生常见问题解答方面的表现。

Prostate. 2024 Jun;84(9):807-813. doi: 10.1002/pros.24699. Epub 2024 Apr 1.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

引用本文的文献

Large language models for clinical decision support in gastroenterology and hepatology.用于胃肠病学和肝病学临床决策支持的大语言模型

Nat Rev Gastroenterol Hepatol. 2025 Aug 22. doi: 10.1038/s41575-025-01108-1.

Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.思考FAST：一种评估对话式人工智能健康教练对话中的保真度、准确性、安全性和语气的新颖框架。

Front Digit Health. 2025 Jun 18;7:1460236. doi: 10.3389/fdgth.2025.1460236. eCollection 2025.

Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究：一个概念框架。

Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.

本文引用的文献

Evaluating the role of large language models in inflammatory bowel disease patient information.评估大型语言模型在炎症性肠病患者信息中的作用。

World J Gastroenterol. 2024 Aug 7;30(29):3538-3540. doi: 10.3748/wjg.v30.i29.3538.

Evaluation of online chat-based artificial intelligence responses about inflammatory bowel disease and diet.评估关于炎症性肠病和饮食的在线聊天式人工智能回复。

Eur J Gastroenterol Hepatol. 2024 Sep 1;36(9):1109-1112. doi: 10.1097/MEG.0000000000002815. Epub 2024 Jul 8.

Appropriateness of ChatGPT in Answering Heart Failure Related Questions.ChatGPT 在回答心力衰竭相关问题方面的适宜性。

Heart Lung Circ. 2024 Sep;33(9):1314-1318. doi: 10.1016/j.hlc.2024.03.005. Epub 2024 May 31.

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.对ChatGPT关于淀粉样变性知识的多学科评估：观察性研究。

JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.

Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.ChatGPT 为炎症性肠病患者提供的信息与 ECCO 指南的准确性比较。

J Crohns Colitis. 2024 Aug 14;18(8):1215-1221. doi: 10.1093/ecco-jcc/jjae040.

Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies.比较语言模型和人类专家在应用欧洲炎症性肠病和恶性肿瘤管理指南方面的效果。

Endoscopy. 2024 Sep;56(9):706-709. doi: 10.1055/a-2289-5732. Epub 2024 Mar 18.

May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis.ChatGPT 能否成为一种为常见炎症性肠病患者问题提供医疗信息的工具？一项基于证据的分析。

World J Gastroenterol. 2024 Jan 7;30(1):17-33. doi: 10.3748/wjg.v30.i1.17.

AGA Clinical Practice Update on Diet and Nutritional Therapies in Patients With Inflammatory Bowel Disease: Expert Review.AGA 临床实践更新：炎症性肠病患者的饮食和营养治疗：专家综述。

Gastroenterology. 2024 Mar;166(3):521-532. doi: 10.1053/j.gastro.2023.11.303. Epub 2024 Jan 23.

Evaluating the role of ChatGPT in gastroenterology: a comprehensive systematic review of applications, benefits, and limitations.评估ChatGPT在胃肠病学中的作用：对其应用、益处及局限性的全面系统综述

Therap Adv Gastroenterol. 2023 Dec 25;16:17562848231218618. doi: 10.1177/17562848231218618. eCollection 2023.

ChatGPT performance in laryngology and head and neck surgery: a clinical case-series.ChatGPT 在喉科学和头颈外科学中的应用：一项临床病例系列研究。

Eur Arch Otorhinolaryngol. 2024 Jan;281(1):319-333. doi: 10.1007/s00405-023-08282-5. Epub 2023 Oct 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过生成式预训练变换器-4检验与炎症性肠病相关营养问题回答的准确性和可重复性。

Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by Generative Pre-trained Transformer-4.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献