通过开放式问题和图像评估ChatGPT在放射肿瘤学临床决策中的应用
Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images.
作者信息
Chuang Wei-Kai, Kao Yung-Shuo, Liu Yen-Ting, Lee Cho-Yin
机构信息
Department of Radiation Oncology, Shuang Ho Hospital, Taipei Medical University, New Taipei City, Taiwan; Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan.
Department of Radiation Oncology, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan, Taiwan.
出版信息
Pract Radiat Oncol. 2025 Apr 29. doi: 10.1016/j.prro.2025.04.009.
PURPOSE
This study assesses the practicality and correctness of Chat Generative Pre-trained Transformer (ChatGPT)-4 and 4O's answers to clinical inquiries in radiation oncology, and evaluates ChatGPT-4O for staging nasopharyngeal carcinoma (NPC) cases with magnetic resonance (MR) images.
METHODS AND MATERIALS
A total of 164 open-ended questions covering representative professional domains (Clinical_G: knowledge on standardized guidelines; Clinical_C: complex clinical scenarios; Nursing: nursing and health education; and Technology: radiation technology and dosimetry) were prospectively formulated by experts and presented to ChatGPT-4 and 4O. Each ChatGPT's answer was graded as 1 (Directly practical for clinical decision-making), 2 (Correct but inadequate), 3 (Mixed with correct and incorrect information), or 4 (Completely incorrect). ChatGPT-4O was presented with the representative diagnostic MR images of 20 patients with NPC across different T stages, and asked to determine the T stage of each case.
RESULTS
The proportions of ChatGPT's answers that were practical (grade 1) varied across professional domains (P < .01), higher in Nursing (GPT-4: 91.9%; GPT-4O: 94.6%) and Clinical_G (GPT-4: 82.2%; GPT-4O: 88.9%) domains than in Clinical_C (GPT-4: 54.1%; GPT-4O: 62.2%) and Technology (GPT-4: 64.4%; GPT-4O: 77.8%) domains. The proportions of correct (grade 1+2) answers (GPT-4: 89.6%; GPT-4O: 98.8%; P < .01) were universally high across all professional domains. However, ChatGPT-4O failed to stage NPC cases via MR images, indiscriminately assigning T4 to all actually non-T4 cases (κ = 0; 95% CI, -0.253 to 0.253).
CONCLUSIONS
ChatGPT could be a safe clinical decision-support tool in radiation oncology, because it correctly answered the vast majority of clinical inquiries across professional domains. However, its clinical practicality should be cautiously weighted particularly in the Clinical_C and Technology domains. ChatGPT-4O is not yet mature to interpret diagnostic images for cancer staging.
目的
本研究评估聊天生成预训练变换器(ChatGPT)-4和4O对放射肿瘤学临床问题回答的实用性和正确性,并评估ChatGPT-4O利用磁共振(MR)图像对鼻咽癌(NPC)病例进行分期的能力。
方法和材料
专家前瞻性地制定了总共164个开放式问题,涵盖代表性专业领域(临床_G:标准化指南知识;临床_C:复杂临床场景;护理:护理与健康教育;技术:放射技术与剂量学),并呈现给ChatGPT-4和4O。ChatGPT的每个回答被评为1(对临床决策直接实用)、2(正确但不充分)、3(正确与错误信息混合)或4(完全错误)。向ChatGPT-4O展示了20例不同T分期的NPC患者的代表性诊断MR图像,并要求其确定每个病例的T分期。
结果
ChatGPT回答实用(1级)的比例在各专业领域有所不同(P <.01),在护理领域(GPT-4:91.9%;GPT-4O:94.6%)和临床_G领域(GPT-4:82.2%;GPT-4O:88.9%)高于临床_C领域(GPT-4:54.1%;GPT-4O:62.2%)和技术领域(GPT-4:64.4%;GPT-4O:77.8%)。正确(1+2级)回答的比例(GPT-4:89.6%;GPT-4O:98.8%;P <.01)在所有专业领域普遍较高。然而,ChatGPT-4O无法通过MR图像对NPC病例进行分期,对所有实际非T4期病例均不分青红皂白地判定为T4期(κ = 0;95%CI,-0.253至0.253)。
结论
ChatGPT在放射肿瘤学中可能是一个安全的临床决策支持工具,因为它正确回答了各专业领域的绝大多数临床问题。然而,其临床实用性应谨慎权衡,尤其是在临床_C和技术领域。ChatGPT-4O在解读癌症分期的诊断图像方面尚未成熟。