Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; New York Proton Center, New York, New York.
New York Proton Center, New York, New York.
J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.
The aim of this study is to assess the accuracy of Chat Generative Pretrained Transformer (ChatGPT) in response to oncology examination questions in the setting of one-shot learning. Consecutive national radiation oncology in-service multiple-choice examinations were collected and inputted into ChatGPT 4o and ChatGPT 3.5 to determine ChatGPT's answers. ChatGPT's answers were then compared with the answer keys to determine whether ChatGPT correctly or incorrectly answered each question and to determine if improvements in responses were seen with the newer ChatGPT version. A total of 600 consecutive questions were inputted into ChatGPT. ChatGPT 4o answered 72.2% questions correctly, whereas 3.5 answered 53.8% questions correctly. There was a significant difference in performance by question category (P < .01). ChatGPT performed poorer with respect to knowledge of landmark studies and treatment recommendations and planning. ChatGPT is a promising technology, with the latest version showing marked improvement. Although it still has limitations, with further evolution, it may be considered a reliable resource for medical training and decision making in the oncology space.
本研究旨在评估 Chat 生成式预训练转换器(ChatGPT)在单次学习环境下对肿瘤学考试问题的回答准确性。连续收集全国放射肿瘤学在职多项选择题考试,并将其输入 ChatGPT 4o 和 ChatGPT 3.5 以确定 ChatGPT 的答案。然后将 ChatGPT 的答案与答案关键进行比较,以确定 ChatGPT 是否正确或错误地回答了每个问题,并确定较新版本的 ChatGPT 是否会提高回答的准确性。总共输入了 600 个连续问题到 ChatGPT 中。ChatGPT 4o 正确回答了 72.2%的问题,而 3.5 正确回答了 53.8%的问题。不同问题类别之间的表现有显著差异(P <.01)。ChatGPT 在标志性研究和治疗建议及计划方面的知识方面表现较差。ChatGPT 是一种很有前途的技术,最新版本显示出显著的改进。尽管它仍然存在局限性,但随着进一步的发展,它可能被认为是肿瘤学领域医学培训和决策的可靠资源。