文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology.

作者信息

Huang Yixing, Gomaa Ahmed, Semrau Sabine, Haderlein Marlen, Lettmaier Sebastian, Weissmann Thomas, Grigo Johanna, Tkhayat Hassen Ben, Frey Benjamin, Gaipl Udo, Distel Luitpold, Maier Andreas, Fietkau Rainer, Bert Christoph, Putz Florian

机构信息

Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.

Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany.

出版信息

Front Oncol. 2023 Sep 14;13:1265024. doi: 10.3389/fonc.2023.1265024. eCollection 2023.


DOI:10.3389/fonc.2023.1265024
PMID:37790756
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10543650/
Abstract

PURPOSE: The potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology. METHODS: The 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases. RESULTS: For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts. CONCLUSION: Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/4da0523b4a29/fonc-13-1265024-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/a0601c00d24a/fonc-13-1265024-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/679db65f200e/fonc-13-1265024-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/86c22ac1755e/fonc-13-1265024-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/052514021c02/fonc-13-1265024-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/8388378ab443/fonc-13-1265024-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/4da0523b4a29/fonc-13-1265024-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/a0601c00d24a/fonc-13-1265024-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/679db65f200e/fonc-13-1265024-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/86c22ac1755e/fonc-13-1265024-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/052514021c02/fonc-13-1265024-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/8388378ab443/fonc-13-1265024-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d8/10543650/4da0523b4a29/fonc-13-1265024-g006.jpg

相似文献

[1]
Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology.

Front Oncol. 2023-9-14

[2]
ChatGPT Earns American Board Certification in Hand Surgery.

Hand Surg Rehabil. 2024-6

[3]
Evaluating large language models on a highly-specialized topic, radiation oncology physics.

Front Oncol. 2023-7-17

[4]
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024-5

[5]
Is ChatGPT ready for primetime? Performance of artificial intelligence on a simulated Canadian urology board exam.

Can Urol Assoc J. 2024-10

[6]
ChatGPT Conquers the Saudi Medical Licensing Exam: Exploring the Accuracy of Artificial Intelligence in Medical Knowledge Assessment and Implications for Modern Medical Education.

Cureus. 2023-9-11

[7]
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023-2-8

[8]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[9]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[10]
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.

JMIR Med Educ. 2024-1-5

引用本文的文献

[1]
Artificial intelligence as treatment support in breast cancer: current perspectives.

Breast. 2025-8-22

[2]
Artificial intelligence across the cancer care continuum.

Cancer. 2025-8-15

[3]
Development and evaluation of large-language models (LLMs) for oncology: A scoping review.

PLOS Digit Health. 2025-8-7

[4]
In Reply to Sengul I and Sengul D.

Adv Radiat Oncol. 2025-7-10

[5]
Comparative analysis of ChatGPT 3.5 and ChatGPT 4 obstetric and gynecological knowledge.

Sci Rep. 2025-7-1

[6]
Assessing the value of artificial intelligence-based image analysis for pre-operative surgical planning of neck dissections and iENE detection in head and neck cancer patients.

Discov Oncol. 2025-5-30

[7]
How valuable are the questions and answers generated by large language models in oral and maxillofacial surgery?

PLoS One. 2025-5-28

[8]
The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images.

J Pers Med. 2025-5-10

[9]
A Comparative Analysis of GPT-4o and ERNIE Bot in a Chinese Radiation Oncology Exam.

J Cancer Educ. 2025-5-26

[10]
Role of Generative Artificial Intelligence in Personalized Medicine: A Systematic Review.

Cureus. 2025-4-15

本文引用的文献

[1]
Interactive computer-aided diagnosis on medical image using large language models.

Commun Eng. 2024-9-17

[2]
Evaluating large language models on a highly-specialized topic, radiation oncology physics.

Front Oncol. 2023-7-17

[3]
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.

Cureus. 2023-6-24

[4]
Large language models encode clinical knowledge.

Nature. 2023-8

[5]
Large language model AI chatbots require approval as medical devices.

Nat Med. 2023-10

[6]
ChatGPT, Bard, and Large Language Models for Biomedical Research: Opportunities and Pitfalls.

Ann Biomed Eng. 2023-12

[7]
Using AI-generated suggestions from ChatGPT to optimize clinical decision support.

J Am Med Inform Assoc. 2023-6-20

[8]
GPT-4: a new era of artificial intelligence in medicine.

Ir J Med Sci. 2023-12

[9]
ChatGPT: Can a Natural Language Processing Tool Be Trusted for Radiation Oncology Use?

Int J Radiat Oncol Biol Phys. 2023-8-1

[10]
Large language models and the perils of their hallucinations.

Crit Care. 2023-3-21

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索