评估ChatGPT-4o肿瘤学专家与标准医学肿瘤学知识相比的表现：聚焦于与治疗相关的临床问题。

Evaluating the Performance of ChatGPT-4o Oncology Expert in Comparison to Standard Medical Oncology Knowledge: A Focus on Treatment-Related Clinical Questions.

作者信息

Kinikoglu Oguzcan, Isik Deniz

机构信息

Medical Oncology, Kartal Dr. Lütfi Kirdar City Hospital, Health Science University, Istanbul, TUR.

出版信息

Cureus. 2025 Jan 27;17(1):e78076. doi: 10.7759/cureus.78076. eCollection 2025 Jan.

DOI:10.7759/cureus.78076

PMID:39872919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11771770/

Abstract

Integrating artificial intelligence (AI) into oncology can revolutionize decision-making by providing accurate information. This study evaluates the performance of ChatGPT-4o (OpenAI, San Francisco, CA) Oncology Expert, in addressing open-ended clinical oncology questions. Thirty-seven treatment-related questions on solid organ tumors were selected from a hematology-oncology textbook. Responses from ChatGPT-4o Oncology Expert and the textbook were anonymized and independently evaluated by two medical oncologists using a structured scoring system focused on accuracy and clinical justification. Statistical analysis, including paired t-tests, was conducted to compare scores, and interrater reliability was assessed using Cohen's Kappa. Oncology Expert achieved a significantly higher average score of 7.83 compared to the textbook's 7.0 (p < 0.01). In 10 cases, Oncology Expert provided more accurate and updated answers, demonstrating its ability to integrate recent medical knowledge. In 26 cases, both sources provided equally relevant answers, but the Oncology Expert's responses were clearer and easier to understand. Cohen's Kappa indicated almost perfect agreement (κ = 0.93). Both sources included outdated information for bladder cancer treatment, underscoring the need for regular updates. ChatGPT-4o Oncology Expert shows significant potential as a clinical tool in oncology by offering precise, up-to-date, and user-friendly responses. It could transform oncology practice by enhancing decision-making efficiency, improving educational tools, and serving as a reliable adjunct to clinical workflows. However, its integration requires regular updates, expert validation, and a collaborative approach to ensure reliability and relevance in the rapidly evolving field of oncology.

摘要

将人工智能（AI）整合到肿瘤学中，可以通过提供准确信息来彻底改变决策方式。本研究评估了ChatGPT-4o（OpenAI，加利福尼亚州旧金山）肿瘤学专家在回答开放式临床肿瘤学问题方面的表现。从一本血液学肿瘤学教科书中选取了37个关于实体器官肿瘤的治疗相关问题。ChatGPT-4o肿瘤学专家和教科书的回答进行了匿名处理，并由两名医学肿瘤学家使用侧重于准确性和临床依据的结构化评分系统进行独立评估。进行了包括配对t检验在内的统计分析以比较分数，并使用科恩kappa系数评估评分者间的可靠性。肿瘤学专家的平均得分显著高于教科书，分别为7.83分和7.0分（p < 0.01）。在10个案例中，肿瘤学专家提供了更准确和最新的答案，展示了其整合最新医学知识的能力。在26个案例中，两个来源提供的答案同样相关，但肿瘤学专家的回答更清晰、更易于理解。科恩kappa系数表明几乎完全一致（κ = 0.93）。两个来源都包含了过时的膀胱癌治疗信息，凸显了定期更新的必要性。ChatGPT-4o肿瘤学专家通过提供精确、最新且用户友好的回答，显示出作为肿瘤学临床工具的巨大潜力。它可以通过提高决策效率、改进教育工具以及作为临床工作流程的可靠辅助手段来改变肿瘤学实践。然而，其整合需要定期更新、专家验证以及采用协作方法，以确保在快速发展的肿瘤学领域中的可靠性和相关性。

相似文献

Evaluating the Performance of ChatGPT-4o Oncology Expert in Comparison to Standard Medical Oncology Knowledge: A Focus on Treatment-Related Clinical Questions.评估ChatGPT-4o肿瘤学专家与标准医学肿瘤学知识相比的表现：聚焦于与治疗相关的临床问题。

Cureus. 2025 Jan 27;17(1):e78076. doi: 10.7759/cureus.78076. eCollection 2025 Jan.

Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响

Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.

ChatGPT-4o outperforms gemini advanced in assisting multidisciplinary decision-making for advanced gastric cancer.ChatGPT-4o在协助晚期胃癌的多学科决策方面优于Gemini Advanced。

Eur J Surg Oncol. 2025 Apr 24;51(8):110096. doi: 10.1016/j.ejso.2025.110096.

Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.评估ChatGPT以测试其作为放射肿瘤学交互式信息数据库的稳健性，并评估其对放疗患者常见问题的回答：一项单机构调查。

Cancer Radiother. 2024 Jun;28(3):258-264. doi: 10.1016/j.canrad.2023.11.005. Epub 2024 Jun 12.

Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.用土耳其医学肿瘤学会年度委员会考试问题对大型语言模型聊天机器人的肿瘤学知识进行基准测试。

BMC Cancer. 2025 Feb 4;25(1):197. doi: 10.1186/s12885-025-13596-0.

A Clinical Evaluation of Cardiovascular Emergencies: A Comparison of Responses from ChatGPT, Emergency Physicians, and Cardiologists.心血管急症的临床评估：ChatGPT、急诊科医生和心脏病专家的反应比较

Diagnostics (Basel). 2024 Dec 4;14(23):2731. doi: 10.3390/diagnostics14232731.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images.通过开放式问题和图像评估ChatGPT在放射肿瘤学临床决策中的应用

Pract Radiat Oncol. 2025 Apr 29. doi: 10.1016/j.prro.2025.04.009.

Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks.ChatGPT-3.5在肿瘤学领域知识的评估：与美国临床肿瘤学会-欧洲肿瘤内科学会基准的比较研究

JMIR AI. 2024 Jan 12;3:e50442. doi: 10.2196/50442.

A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.人工智能平台的比较分析：ChatGPT-4o与谷歌Gemini在回答避孕方法相关问题方面的表现

Cureus. 2025 Jan 1;17(1):e76745. doi: 10.7759/cureus.76745. eCollection 2025 Jan.

本文引用的文献

Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响

Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.

Effectiveness of ChatGPT 4.0 in Telemedicine-Based Management of Metastatic Prostate Carcinoma.ChatGPT 4.0在基于远程医疗的转移性前列腺癌管理中的有效性。

Diagnostics (Basel). 2024 Aug 29;14(17):1899. doi: 10.3390/diagnostics14171899.

Performance of Large Language Models on Medical Oncology Examination Questions.大语言模型在医学肿瘤学考试问题上的表现。

JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.

Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for primary head and neck cancer cases.评估先进人工智能作为一种工具在多学科肿瘤委员会针对原发性头颈癌病例进行决策中的作用。

Front Oncol. 2024 May 24;14:1353031. doi: 10.3389/fonc.2024.1353031. eCollection 2024.

Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.大型语言模型在回答免疫肿瘤学问题中的比较：一项横断面研究。

Oncologist. 2024 May 3;29(5):407-414. doi: 10.1093/oncolo/oyae009.

Ethical implications of AI and robotics in healthcare: A review.人工智能和机器人技术在医疗保健中的伦理问题：综述。

Medicine (Baltimore). 2023 Dec 15;102(50):e36671. doi: 10.1097/MD.0000000000036671.

Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology.在放射肿瘤学培训考试和《红杂志》灰色地带病例上对ChatGPT-4进行基准测试：人工智能辅助放射肿瘤学医学教育和决策的潜力与挑战

Front Oncol. 2023 Sep 14;13:1265024. doi: 10.3389/fonc.2023.1265024. eCollection 2023.

Evaluating large language models on a highly-specialized topic, radiation oncology physics.在高度专业化的主题——放射肿瘤物理学上评估大语言模型。

Front Oncol. 2023 Jul 17;13:1219326. doi: 10.3389/fonc.2023.1219326. eCollection 2023.

Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma.评估 ChatGPT 在回答肝硬化和肝细胞癌相关问题方面的表现。

Clin Mol Hepatol. 2023 Jul;29(3):721-732. doi: 10.3350/cmh.2023.0089. Epub 2023 Mar 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验