评估大语言模型在回答常见患者胃肠道健康相关问题中的效用：我们做到了吗？

Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?

作者信息

Lahat Adi, Shachar Eyal, Avidan Benjamin, Glicksberg Benjamin, Klang Eyal

机构信息

Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel.

Mount Sinai Clinical Intelligence Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

出版信息

Diagnostics (Basel). 2023 Jun 2;13(11):1950. doi: 10.3390/diagnostics13111950.

DOI:10.3390/diagnostics13111950

PMID:37296802

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10252924/

Abstract

BACKGROUND AND AIMS

Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI's ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients' questions regarding gastrointestinal health.

METHODS

To evaluate the performance of ChatGPT in answering patients' questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed.

RESULTS

ChatGPT was able to provide accurate and clear answers to patients' questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively.

CONCLUSIONS

While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.

摘要

背景与目的

患者常常对自身疾病感到担忧，且发现获取准确信息颇具挑战性。OpenAI的ChatGPT聊天机器人（ChatGPT）是一款新开发的大型语言模型，旨在回答各个领域的广泛问题。我们的目的是评估ChatGPT在回答患者有关胃肠道健康问题方面的表现。

方法

为评估ChatGPT回答患者问题的表现，我们使用了110个现实生活问题的代表性样本。由三位经验丰富的胃肠病学家共同对ChatGPT给出的答案进行评分。评估ChatGPT给出答案的准确性、清晰度和有效性。

结果

ChatGPT在某些情况下能够为患者的问题提供准确清晰的答案，但在其他情况下则不然。对于治疗相关问题，平均准确性、清晰度和有效性得分（1至5分）分别为3.9±0.8、3.9±0.9和3.3±0.9。对于症状相关问题，平均准确性、清晰度和有效性得分分别为3.4±0.8、3.7±0.7和3.2±0.7。对于诊断测试相关问题，平均准确性、清晰度和有效性得分分别为3.7±1.7、3.7±1.8和3.5±1.7。

结论

虽然ChatGPT有作为信息来源的潜力，但仍需进一步发展。信息质量取决于所提供在线信息的质量。这些发现可能对医疗服务提供者和患者理解ChatGPT的能力与局限性都有用处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdf3/10252924/b25e6a45c563/diagnostics-13-01950-g001.jpg

相似文献

Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?评估大语言模型在回答常见患者胃肠道健康相关问题中的效用：我们做到了吗？

Diagnostics (Basel). 2023 Jun 2;13(11):1950. doi: 10.3390/diagnostics13111950.

Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.根据基于能力的医学教育课程评估ChatGPT回答微生物学一阶和二阶知识问题的能力。

Cureus. 2023 Mar 12;15(3):e36034. doi: 10.7759/cureus.36034. eCollection 2023 Mar.

Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?ChatGPT在回答有关头颈癌的问题时准确可靠吗？

Front Oncol. 2023 Dec 1;13:1256459. doi: 10.3389/fonc.2023.1256459. eCollection 2023.

Evaluating the performance of the language model ChatGPT in responding to common questions of people with epilepsy.评估语言模型ChatGPT在回答癫痫患者常见问题方面的表现。

Epilepsy Behav. 2024 Feb;151:109645. doi: 10.1016/j.yebeh.2024.109645. Epub 2024 Jan 19.

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。

JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.

ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比：对耳鼻喉科基于病例问题回答的盲法评估

JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.

Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing.大语言模型在血液学病例解决中的应用：ChatGPT-3.5、谷歌巴德和微软必应的比较研究

Cureus. 2023 Aug 21;15(8):e43861. doi: 10.7759/cureus.43861. eCollection 2023 Aug.

Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia.评估ChatGPT作为基于人工智能的大型语言模型在获取有关近视问题答案方面的实用性。

Ophthalmic Physiol Opt. 2023 Nov;43(6):1562-1570. doi: 10.1111/opo.13207. Epub 2023 Jul 21.

Evaluating the Current Ability of ChatGPT to Assist in Professional Otolaryngology Education.评估ChatGPT目前在专业耳鼻喉科教育中的辅助能力。

OTO Open. 2023 Nov 22;7(4):e94. doi: 10.1002/oto2.94. eCollection 2023 Oct-Dec.

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation.ChatGPT-4 在回答巴西医学学位再认证国家考试问题方面的表现。

Rev Assoc Med Bras (1992). 2023 Sep 25;69(10):e20230848. doi: 10.1590/1806-9282.20230848. eCollection 2023.

引用本文的文献

The potential utility of CHATGPT4.0 as an AI assistant in the education and management of patients with Barrett's esophagus.CHATGPT4.0作为人工智能助手在巴雷特食管患者教育与管理中的潜在效用。

Dis Esophagus. 2025 Jul 3;38(4). doi: 10.1093/dote/doaf050.

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.大型语言模型在回答结核病医学问题方面的能力：对ChatGPT、Gemini和Copilot进行测试

Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.

Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology.大语言模型（ChatGPT和Gemini Advanced）在胃肠病理学及胃肠病学应用临床综述中的表现

Cureus. 2025 Apr 2;17(4):e81618. doi: 10.7759/cureus.81618. eCollection 2025 Apr.

Large Language Models in Gastroenterology and Gastrointestinal Surgery: A New Frontier in Patient Communication and Education.胃肠病学和胃肠外科中的大语言模型：患者沟通与教育的新前沿

Gastroenterology Res. 2025 Apr;18(2):39-48. doi: 10.14740/gr2011. Epub 2025 Mar 24.

Assessing the feasibility of large language models to identify top research priorities in enhanced external counterpulsation.评估大语言模型确定增强型体外反搏研究重点的可行性。

PLoS One. 2025 Apr 15;20(4):e0305442. doi: 10.1371/journal.pone.0305442. eCollection 2025.

Comparative performance analysis of global and chinese-domain large language models for myopia.全球和中国领域用于近视研究的大语言模型的性能对比分析

Eye (Lond). 2025 Apr 13. doi: 10.1038/s41433-025-03775-5.

Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究：一个概念框架。

Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.

Evaluating Sex and Age Biases in Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images.评估用于从皮肤镜图像中识别皮肤病的多模态大语言模型中的性别和年龄偏差。

Health Data Sci. 2025 Apr 1;5:0256. doi: 10.34133/hds.0256. eCollection 2025.

Evaluating the quality of medical content on YouTube using large language models.使用大语言模型评估YouTube上医学内容的质量。

Sci Rep. 2025 Mar 22;15(1):9906. doi: 10.1038/s41598-025-94208-6.

本文引用的文献

Artificial intelligence-based ChatGPT chatbot responses for patient and parent questions on vernal keratoconjunctivitis.基于人工智能的ChatGPT聊天机器人对患者及家长关于春季角结膜炎问题的回复。

Graefes Arch Clin Exp Ophthalmol. 2023 Oct;261(10):3041-3043. doi: 10.1007/s00417-023-06078-1. Epub 2023 May 2.

Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery.评估语言模型 ChatGPT 对肥胖症手术相关问题回答的准确性。

Obes Surg. 2023 Jun;33(6):1790-1796. doi: 10.1007/s11695-023-06603-5. Epub 2023 Apr 27.

Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT.人工智能提供的美容外科建议和咨询：ChatGPT 参与的隆鼻咨询。

Aesthetic Plast Surg. 2023 Oct;47(5):1985-1993. doi: 10.1007/s00266-023-03338-7. Epub 2023 Apr 24.

Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning.人工智能在肝病学中的文本生成器：ChatGPT 只是个开始。

Hepatol Commun. 2023 Mar 24;7(4). doi: 10.1097/HC9.0000000000000097. eCollection 2023 Apr 1.

Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma.评估 ChatGPT 在回答肝硬化和肝细胞癌相关问题方面的表现。

Clin Mol Hepatol. 2023 Jul;29(3):721-732. doi: 10.3350/cmh.2023.0089. Epub 2023 Mar 22.

Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information.利用 ChatGPT 评估癌症谣言和误解：人工智能与癌症信息。

JNCI Cancer Spectr. 2023 Mar 1;7(2). doi: 10.1093/jncics/pkad015.

Evaluating the use of large language model in identifying top research questions in gastroenterology.评估大型语言模型在识别胃肠病学领域顶级研究问题中的应用。

Sci Rep. 2023 Mar 13;13(1):4164. doi: 10.1038/s41598-023-31412-2.

The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers.ChatGPT、生成式语言模型和人工智能在医学教育中的作用：与ChatGPT的对话及论文征集

JMIR Med Educ. 2023 Mar 6;9:e46885. doi: 10.2196/46885.

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性：一项初步研究。

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

Can advanced technologies help address the global increase in demand for specialized medical care and improve telehealth services?先进技术能否有助于满足全球对专科医疗护理日益增长的需求并改善远程医疗服务？

J Telemed Telecare. 2024 Oct;30(9):1516-1517. doi: 10.1177/1357633X231155520. Epub 2023 Feb 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估大语言模型在回答常见患者胃肠道健康相关问题中的效用：我们做到了吗？

Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet?

作者信息

机构信息

出版信息

BACKGROUND AND AIMS

METHODS

RESULTS

CONCLUSIONS

背景与目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献