• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能大语言模型关联型聊天机器人在胃食管反流病手术决策中的应用。

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.

机构信息

Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada.

University of California South California, East Bay, Oakland, CA, USA.

出版信息

Surg Endosc. 2024 May;38(5):2320-2330. doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.

DOI:10.1007/s00464-024-10807-w
PMID:38630178
Abstract

BACKGROUND

Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD).

METHODS

Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages.

RESULTS

Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity.

CONCLUSIONS

Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM's when utilized for advice on surgical management of GERD. Additional training of LLM's using evidence-based health information is needed.

摘要

背景

大型语言模型(LLM)链接的聊天机器人可能是医疗保健提供者和患者获取临床建议的有效来源。本研究评估了 LLM 链接的聊天机器人在提供胃食管反流病(GERD)手术管理建议方面的表现。

方法

根据美国胃肠内镜外科医师学会(SAGES)GERD 手术治疗指南中提出的关键问题,创建了 9 个患者病例。2023 年 11 月 16 日,向 ChatGPT-3.5、ChatGPT-4、Copilot、Google Bard 和 Perplexity AI 查询了有关 GERD 手术管理的建议。准确的聊天机器人性能定义为与 SAGES 指南建议一致的响应数量。结果以计数和百分比报告。

结果

根据 SAGES 指南,胃肠外科医生在成人患者的 7 个关键问题中获得了 5/7(71.4%)的准确 GERD 手术管理建议,Copilot 获得了 3/7(42.9%),Google Bard 获得了 6/7(85.7%),Perplexity 获得了 3/7(42.9%)。在 5 个关键问题中,患者分别获得了 ChatGPT-4 的 3/5(60.0%)、Copilot 的 2/5(40.0%)、Google Bard 的 4/5(80.0%)和 Perplexity 的 1/5(20.0%)的准确建议。在儿科患者中,胃肠外科医生获得了 ChatGPT-4 的 2/3(66.7%)、Copilot 的 3/3(100.0%)、Google Bard 的 3/3(100.0%)和 Perplexity 的 2/3(66.7%)的准确建议。患者分别获得了 ChatGPT-4 的 2/2(100.0%)、Copilot 的 2/2(100.0%)、Google Bard 的 1/2(50.0%)和 Perplexity 的 1/2(50.0%)的适当指导。

结论

胃肠外科医生、胃肠病学家和患者在使用 LLM 提供 GERD 手术管理建议时,应认识到其承诺和陷阱。需要使用基于证据的健康信息对 LLM 进行额外培训。

相似文献

1
The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.人工智能大语言模型关联型聊天机器人在胃食管反流病手术决策中的应用。
Surg Endosc. 2024 May;38(5):2320-2330. doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.
2
Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD.临床人工智能:教授大型语言模型生成与 GERD 手术管理指南一致的建议。
Surg Endosc. 2024 Oct;38(10):5668-5677. doi: 10.1007/s00464-024-11155-5. Epub 2024 Aug 12.
3
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
4
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
5
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.
6
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
7
Comparative analysis of artificial intelligence chatbot recommendations for urolithiasis management: A study of EAU guideline compliance.人工智能聊天机器人对尿石症管理建议的比较分析:一项关于欧洲泌尿外科学会指南依从性的研究
Fr J Urol. 2024 Jul;34(7-8):102666. doi: 10.1016/j.fjurol.2024.102666. Epub 2024 Jun 5.
8
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
9
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。
Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.
10
Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用:ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

引用本文的文献

1
Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.ChatGPT-3.5、ChatGPT-4o、Copilot、Gemini、Claude和Perplexity在依据临床实践指南对腰骶神经根性疼痛提供建议方面的准确性:横断面研究
Front Digit Health. 2025 Jun 27;7:1574287. doi: 10.3389/fdgth.2025.1574287. eCollection 2025.
2
Leveraging ChatGPT to strengthen pediatric healthcare systems: a systematic review.利用ChatGPT加强儿科医疗系统:一项系统综述
Eur J Pediatr. 2025 Jul 12;184(8):478. doi: 10.1007/s00431-025-06320-4.
3

本文引用的文献

1
Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis.用于小儿阑尾炎的基于超声的可解释且可干预的机器学习模型。
Med Image Anal. 2024 Jan;91:103042. doi: 10.1016/j.media.2023.103042. Epub 2023 Nov 23.
2
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
3
Development and Validation of a Machine Learning Model to Identify Patients Before Surgery at High Risk for Postoperative Adverse Events.
Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.
大型语言模型在回答结核病医学问题方面的能力:对ChatGPT、Gemini和Copilot进行测试
Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.
4
Large Language Models for Chatbot Health Advice Studies: A Systematic Review.用于聊天机器人健康建议研究的大语言模型:一项系统综述。
JAMA Netw Open. 2025 Feb 3;8(2):e2457879. doi: 10.1001/jamanetworkopen.2024.57879.
5
A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity.大语言模型在圆锥角膜中的性能评估:ChatGPT-3.5、ChatGPT-4.0、Gemini、Copilot、Chatsonic和Perplexity的比较研究
J Clin Med. 2024 Oct 30;13(21):6512. doi: 10.3390/jcm13216512.
6
Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches.评估人工智能模型在脊柱侧弯分类中的准确性及建议的治疗方法。
J Clin Med. 2024 Jul 9;13(14):4013. doi: 10.3390/jcm13144013.
开发和验证一种机器学习模型,以识别手术前术后不良事件风险较高的患者。
JAMA Netw Open. 2023 Jul 3;6(7):e2322285. doi: 10.1001/jamanetworkopen.2023.22285.
4
Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.ChatGPT 提供的医学信息的可靠性:与临床指南和患者信息质量工具的评估。
J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.
5
ChatGPT Answers Common Patient Questions About Colonoscopy.ChatGPT回答患者关于结肠镜检查的常见问题。
Gastroenterology. 2023 Aug;165(2):509-511.e7. doi: 10.1053/j.gastro.2023.04.033. Epub 2023 May 5.
6
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.
7
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
8
Using deep learning and explainable artificial intelligence to assess the severity of gastroesophageal reflux disease according to the Los Angeles Classification System.利用深度学习和可解释人工智能根据洛杉矶分类系统评估胃食管反流病的严重程度。
Scand J Gastroenterol. 2023 Jun;58(6):596-604. doi: 10.1080/00365521.2022.2163185. Epub 2023 Jan 9.
9
Development and Validation of an Artificial Intelligence-Based Model to Predict Gastroesophageal Reflux Disease After Sleeve Gastrectomy.基于人工智能的袖状胃切除术后胃食管反流病预测模型的建立与验证。
Obes Surg. 2022 Aug;32(8):2537-2547. doi: 10.1007/s11695-022-06112-x. Epub 2022 May 21.
10
SAGES guidelines for the surgical treatment of gastroesophageal reflux (GERD).SAGES 指南:胃食管反流(GERD)的手术治疗。
Surg Endosc. 2021 Sep;35(9):4903-4917. doi: 10.1007/s00464-021-08625-5. Epub 2021 Jul 19.