人工智能大语言模型关联型聊天机器人在胃食管反流病手术决策中的应用。

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.

机构信息

Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.

University of California South California, East Bay, Oakland, CA, USA.

出版信息

Surg Endosc. 2024 May;38(5):2320-2330. doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.

DOI:10.1007/s00464-024-10807-w

PMID:38630178

Abstract

BACKGROUND

Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD).

METHODS

Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages.

RESULTS

Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity.

CONCLUSIONS

Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM's when utilized for advice on surgical management of GERD. Additional training of LLM's using evidence-based health information is needed.

摘要

背景

大型语言模型（LLM）链接的聊天机器人可能是医疗保健提供者和患者获取临床建议的有效来源。本研究评估了 LLM 链接的聊天机器人在提供胃食管反流病（GERD）手术管理建议方面的表现。

方法

根据美国胃肠内镜外科医师学会（SAGES）GERD 手术治疗指南中提出的关键问题，创建了 9 个患者病例。2023 年 11 月 16 日，向 ChatGPT-3.5、ChatGPT-4、Copilot、Google Bard 和 Perplexity AI 查询了有关 GERD 手术管理的建议。准确的聊天机器人性能定义为与 SAGES 指南建议一致的响应数量。结果以计数和百分比报告。

结果

根据 SAGES 指南，胃肠外科医生在成人患者的 7 个关键问题中获得了 5/7（71.4%）的准确 GERD 手术管理建议，Copilot 获得了 3/7（42.9%），Google Bard 获得了 6/7（85.7%），Perplexity 获得了 3/7（42.9%）。在 5 个关键问题中，患者分别获得了 ChatGPT-4 的 3/5（60.0%）、Copilot 的 2/5（40.0%）、Google Bard 的 4/5（80.0%）和 Perplexity 的 1/5（20.0%）的准确建议。在儿科患者中，胃肠外科医生获得了 ChatGPT-4 的 2/3（66.7%）、Copilot 的 3/3（100.0%）、Google Bard 的 3/3（100.0%）和 Perplexity 的 2/3（66.7%）的准确建议。患者分别获得了 ChatGPT-4 的 2/2（100.0%）、Copilot 的 2/2（100.0%）、Google Bard 的 1/2（50.0%）和 Perplexity 的 1/2（50.0%）的适当指导。

结论

胃肠外科医生、胃肠病学家和患者在使用 LLM 提供 GERD 手术管理建议时，应认识到其承诺和陷阱。需要使用基于证据的健康信息对 LLM 进行额外培训。

相似文献

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.

Surg Endosc. 2024 May;38(5):2320-2330. doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.

Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD.

Surg Endosc. 2024 Oct;38(10):5668-5677. doi: 10.1007/s00464-024-11155-5. Epub 2024 Aug 12.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.

Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Comparative analysis of artificial intelligence chatbot recommendations for urolithiasis management: A study of EAU guideline compliance.

Fr J Urol. 2024 Jul;34(7-8):102666. doi: 10.1016/j.fjurol.2024.102666. Epub 2024 Jun 5.

The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.

J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.

Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

引用本文的文献

Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.

Front Digit Health. 2025 Jun 27;7:1574287. doi: 10.3389/fdgth.2025.1574287. eCollection 2025.

Leveraging ChatGPT to strengthen pediatric healthcare systems: a systematic review.

Eur J Pediatr. 2025 Jul 12;184(8):478. doi: 10.1007/s00431-025-06320-4.

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.

Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.

Large Language Models for Chatbot Health Advice Studies: A Systematic Review.

JAMA Netw Open. 2025 Feb 3;8(2):e2457879. doi: 10.1001/jamanetworkopen.2024.57879.

A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity.

J Clin Med. 2024 Oct 30;13(21):6512. doi: 10.3390/jcm13216512.

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches.

J Clin Med. 2024 Jul 9;13(14):4013. doi: 10.3390/jcm13144013.

本文引用的文献

Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis.

Med Image Anal. 2024 Jan;91:103042. doi: 10.1016/j.media.2023.103042. Epub 2023 Nov 23.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Development and Validation of a Machine Learning Model to Identify Patients Before Surgery at High Risk for Postoperative Adverse Events.

JAMA Netw Open. 2023 Jul 3;6(7):e2322285. doi: 10.1001/jamanetworkopen.2023.22285.

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.

J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.

ChatGPT Answers Common Patient Questions About Colonoscopy.

Gastroenterology. 2023 Aug;165(2):509-511.e7. doi: 10.1053/j.gastro.2023.04.033. Epub 2023 May 5.

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Using deep learning and explainable artificial intelligence to assess the severity of gastroesophageal reflux disease according to the Los Angeles Classification System.

Scand J Gastroenterol. 2023 Jun;58(6):596-604. doi: 10.1080/00365521.2022.2163185. Epub 2023 Jan 9.

Development and Validation of an Artificial Intelligence-Based Model to Predict Gastroesophageal Reflux Disease After Sleeve Gastrectomy.

Obes Surg. 2022 Aug;32(8):2537-2547. doi: 10.1007/s11695-022-06112-x. Epub 2022 May 21.

SAGES guidelines for the surgical treatment of gastroesophageal reflux (GERD).

Surg Endosc. 2021 Sep;35(9):4903-4917. doi: 10.1007/s00464-021-08625-5. Epub 2021 Jul 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能大语言模型关联型聊天机器人在胃食管反流病手术决策中的应用。

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献