ChatGPT-4o在协助晚期胃癌的多学科决策方面优于Gemini Advanced。
ChatGPT-4o outperforms gemini advanced in assisting multidisciplinary decision-making for advanced gastric cancer.
作者信息
Li Huizi, Huang Jiaobao, Liu Kuntang, Liu Jibiao, Liu Queling, Zhou Zhiyong, Zong Zhen, Mao Shengxun
机构信息
Department of General Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China.
Department of General Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, China.
出版信息
Eur J Surg Oncol. 2025 Apr 24;51(8):110096. doi: 10.1016/j.ejso.2025.110096.
BACKGROUND & AIMS: The treatment of advanced gastric cancer (GC) requires precise and comprehensive clinical decision-making. Artificial intelligence (AI) chatbots offer potential tools to enhance multidisciplinary team (MDT) discussions. This study aims to compare the performances of ChatGPT-4o and Gemini Advanced in generating treatment recommendations for advanced GC.
METHODS
The study involved three steps: (1) evaluating responses to ten critical clinical questions, (2) analyzing clinical cases from MDT meetings at our institution, and (3) reviewing rare GC cases from PubMed. It included 95 advanced GC patients discussed between November 2022 and July 2024, and 14 rare cases from PubMed. Prompts designed from advanced GC cases were submitted to ChatGPT-4o and Gemini Advanced using a standardized format. Outputs were evaluated for accuracy and completeness using a structured 4-point Likert scale. Interrater reliability was calculated to ensure consistency among evaluators.
RESULTS
For the ten clinical questions, ChatGPT-4o achieved better performances compared to Gemini Advanced. In MDT cases, ChatGPT-4o provided more valuable recommendations in surgical suggestion, chemotherapy recommendation, and chemotherapy regimens. Subgroup analysis confirmed these findings in both routine and complex cases with high interrater reliability. ChatGPT-4o also outperformed Gemini Advanced in the analysis of rare GC cases from PubMed, showing superior accuracy with high interrater reliability.
CONCLUSIONS
While our findings suggest that AI chatbots can generate clinically relevant and guideline-based treatment recommendations, their use in MDT decision-making should be viewed as supportive rather than autonomous. We emphasize that while AI chatbots have potential as decision-support tools, but they should be integrated only under expert supervision in a real-world clinical context.
背景与目的
晚期胃癌(GC)的治疗需要精确且全面的临床决策。人工智能(AI)聊天机器人为加强多学科团队(MDT)讨论提供了潜在工具。本研究旨在比较ChatGPT-4o和Gemini Advanced在生成晚期GC治疗建议方面的表现。
方法
该研究包括三个步骤:(1)评估对十个关键临床问题的回答;(2)分析我们机构MDT会议中的临床病例;(3)查阅来自PubMed的罕见GC病例。研究纳入了2022年11月至2024年7月期间讨论的95例晚期GC患者以及来自PubMed的14例罕见病例。从晚期GC病例设计的提示以标准化格式提交给ChatGPT-4o和Gemini Advanced。使用结构化的4点李克特量表评估输出的准确性和完整性。计算评估者间信度以确保评估者之间的一致性。
结果
对于十个临床问题,ChatGPT-4o相比Gemini Advanced表现更佳。在MDT病例中,ChatGPT-4o在手术建议、化疗推荐和化疗方案方面提供了更有价值的建议。亚组分析在具有高评估者间信度的常规和复杂病例中均证实了这些发现。ChatGPT-4o在分析来自PubMed的罕见GC病例方面也优于Gemini Advanced,显示出更高的准确性和高评估者间信度。
结论
虽然我们的研究结果表明AI聊天机器人可以生成与临床相关且基于指南的治疗建议,但其在MDT决策中的使用应被视为辅助性而非自主性的。我们强调,虽然AI聊天机器人有作为决策支持工具的潜力,但仅应在现实临床环境中的专家监督下进行整合。