人工智能 ChatGPT 在肿瘤学检查问题中的准确性。

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.

机构信息

Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; New York Proton Center, New York, New York.

New York Proton Center, New York, New York.

出版信息

J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.

DOI:10.1016/j.jacr.2024.07.011

PMID:39098369

Abstract

The aim of this study is to assess the accuracy of Chat Generative Pretrained Transformer (ChatGPT) in response to oncology examination questions in the setting of one-shot learning. Consecutive national radiation oncology in-service multiple-choice examinations were collected and inputted into ChatGPT 4o and ChatGPT 3.5 to determine ChatGPT's answers. ChatGPT's answers were then compared with the answer keys to determine whether ChatGPT correctly or incorrectly answered each question and to determine if improvements in responses were seen with the newer ChatGPT version. A total of 600 consecutive questions were inputted into ChatGPT. ChatGPT 4o answered 72.2% questions correctly, whereas 3.5 answered 53.8% questions correctly. There was a significant difference in performance by question category (P < .01). ChatGPT performed poorer with respect to knowledge of landmark studies and treatment recommendations and planning. ChatGPT is a promising technology, with the latest version showing marked improvement. Although it still has limitations, with further evolution, it may be considered a reliable resource for medical training and decision making in the oncology space.

摘要

本研究旨在评估 Chat 生成式预训练转换器（ChatGPT）在单次学习环境下对肿瘤学考试问题的回答准确性。连续收集全国放射肿瘤学在职多项选择题考试，并将其输入 ChatGPT 4o 和 ChatGPT 3.5 以确定 ChatGPT 的答案。然后将 ChatGPT 的答案与答案关键进行比较，以确定 ChatGPT 是否正确或错误地回答了每个问题，并确定较新版本的 ChatGPT 是否会提高回答的准确性。总共输入了 600 个连续问题到 ChatGPT 中。ChatGPT 4o 正确回答了 72.2%的问题，而 3.5 正确回答了 53.8%的问题。不同问题类别之间的表现有显著差异（P <.01）。ChatGPT 在标志性研究和治疗建议及计划方面的知识方面表现较差。ChatGPT 是一种很有前途的技术，最新版本显示出显著的改进。尽管它仍然存在局限性，但随着进一步的发展，它可能被认为是肿瘤学领域医学培训和决策的可靠资源。

相似文献

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.人工智能 ChatGPT 在肿瘤学检查问题中的准确性。

J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响：来自台湾护理执照考试的见解。

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions.ChatGPT 在美外科学院住院医师考试备考问题上的表现。

J Surg Res. 2024 Jul;299:329-335. doi: 10.1016/j.jss.2024.04.060. Epub 2024 May 23.

Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.评估ChatGPT以测试其作为放射肿瘤学交互式信息数据库的稳健性，并评估其对放疗患者常见问题的回答：一项单机构调查。

Cancer Radiother. 2024 Jun;28(3):258-264. doi: 10.1016/j.canrad.2023.11.005. Epub 2024 Jun 12.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.ChatGPT 在中美护理执照考试中的表现：横断面研究。

JMIR Med Educ. 2024 Oct 3;10:e52746. doi: 10.2196/52746.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现：调查研究。

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.航海七海：ChatGPT 在医学执照考试中的表现的跨国比较。

Ann Biomed Eng. 2024 Jun;52(6):1542-1545. doi: 10.1007/s10439-023-03338-3. Epub 2023 Aug 8.

Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination?人工智能在骨科领域的应用：ChatGPT 能否通过皇家外科学院（创伤与骨科）研究员资格 Section 1 考试？

Postgrad Med J. 2023 Sep 21;99(1176):1110-1114. doi: 10.1093/postmj/qgad053.

引用本文的文献

ChatGPT's role in the rapidly evolving hematologic cancer landscape.ChatGPT在迅速演变的血液学癌症领域中的作用。

Future Sci OA. 2025 Dec;11(1):2546259. doi: 10.1080/20565623.2025.2546259. Epub 2025 Sep 3.

Using large language models to generate child-friendly education materials on myopia.使用大语言模型生成适合儿童的近视教育材料。

Digit Health. 2025 Jul 30;11:20552076251362338. doi: 10.1177/20552076251362338. eCollection 2025 Jan-Dec.

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用：Claude、GPT和Gemini的比较研究

JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性：系统评价与网络荟萃分析

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease.评估ChatGPT在甲状腺眼病患者咨询及基于图像的初步诊断中的表现。

Front Med (Lausanne). 2025 Feb 18;12:1546706. doi: 10.3389/fmed.2025.1546706. eCollection 2025.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能 ChatGPT 在肿瘤学检查问题中的准确性。

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献