人工智能在皮肤科考试中的表现：ChatGPT的考试成效与局限

The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT.

作者信息

Göçer Gürok Neşe, Öztürk Savaş

机构信息

Department of Dermatology, Elazığ Fethi Sekin City Health Application and Research Center, University of Health Sciences, Elazig, Turkey.

出版信息

J Cosmet Dermatol. 2025 May;24(5):e70244. doi: 10.1111/jocd.70244.

DOI:10.1111/jocd.70244

PMID:40387311

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12087418/

Abstract

BACKGROUND

Artificial intelligence holds significant potential in dermatology.

OBJECTIVES

This study aimed to explore the potential and limitations of artificial intelligence applications in dermatology education by evaluating ChatGPT's performance on questions from the dermatology residency exam.

METHOD

In this study, the dermatology residency exam results for ChatGPT versions 3.5 and 4.0 were compared with those of resident doctors across various seniority levels. Dermatology resident doctors were categorized into four seniority levels based on their education, and a total of 100 questions-25 multiple-choice questions for each seniority level-were included in the exam. The same questions were also administered to ChatGPT versions 3.5 and 4.0, and the scores were analyzed statistically.

RESULTS

ChatGPT 3.5 performed poorly, especially when compared to senior residents. Second (p = 0.038), third (p = 0.041), and fourth-year senior resident physicians (p = 0.020) scored significantly higher than ChatGPT 3.5. ChatGPT 4.0 showed similar performance compared to first- and third-year senior resident physicians, but performed worse in comparison to second (p = 0.037) and fourth-year senior resident physicians (p = 0.029). Both versions scored lower as seniority and exam difficulty increased. ChatGPT 3.5 passed the first and second-year exams but failed the third and fourth-year exams. ChatGPT 4.0 passed the first, second, and third-year exams but failed the fourth-year exam. These findings suggest that ChatGPT was not on par with senior resident physicians, particularly on topics requiring advanced knowledge; however, version 4.0 proved to be more effective than version 3.5.

CONCLUSION

In the future, as ChatGPT's language support and knowledge of medicine improve, it can be used more effectively in educational processes.

摘要

背景

人工智能在皮肤病学领域具有巨大潜力。

目的

本研究旨在通过评估ChatGPT在皮肤科住院医师考试问题上的表现，探讨人工智能在皮肤科教育中的潜力和局限性。

方法

在本研究中，将ChatGPT 3.5和4.0版本的皮肤科住院医师考试结果与不同资历水平的住院医生进行比较。皮肤科住院医生根据其学历被分为四个资历水平，考试共包含100道题——每个资历水平25道选择题。同样的问题也被用于测试ChatGPT 3.5和4.0版本，并对分数进行统计分析。

结果

ChatGPT 3.5表现不佳，尤其是与高年级住院医生相比。二年级（p = 0.038）、三年级（p = 0.041）和四年级住院医师（p = 0.020）的得分显著高于ChatGPT 3.5。ChatGPT 4.0与一年级和三年级住院医师相比表现相似，但与二年级（p = 0.037）和四年级住院医师相比表现较差（p = 0.029）。随着资历和考试难度的增加，两个版本的得分都降低。ChatGPT 3.5通过了一年级和二年级考试，但未通过三年级和四年级考试。ChatGPT 4.0通过了一年级、二年级和三年级考试，但未通过四年级考试。这些发现表明，ChatGPT与高年级住院医师相比还有差距，尤其是在需要高级知识的主题上；然而，4.0版本比3.5版本更有效。

结论

未来，随着ChatGPT的语言支持和医学知识的提高，它可以在教育过程中得到更有效的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd78/12087418/9a54062a4230/JOCD-24-e70244-g001.jpg

相似文献

The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT.

J Cosmet Dermatol. 2025 May;24(5):e70244. doi: 10.1111/jocd.70244.

Performance of Chatgpt in ophthalmology exam; human versus AI.

Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

Evaluating the application of ChatGPT in China's residency training education: An exploratory study.

Med Teach. 2025 May;47(5):858-864. doi: 10.1080/0142159X.2024.2377808. Epub 2024 Jul 12.

Bridging AI and Medical Expertise: ChatGPT's Success on the Medical Specialization Residency Admission Exam in Spain.

Stud Health Technol Inform. 2025 May 15;327:1054-1058. doi: 10.3233/SHTI250544.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

BMC Med Educ. 2025 Feb 10;25(1):214. doi: 10.1186/s12909-024-06389-9.

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).

Surgeon. 2025 Jun;23(3):187-191. doi: 10.1016/j.surge.2025.04.004. Epub 2025 Apr 22.

Can ChatGPT pass the Turkish Orthopedics and Traumatology Board Examination? Turkish orthopedic surgeons versus artificial intelligence.

Ulus Travma Acil Cerrahi Derg. 2025 Mar;31(3):310-315. doi: 10.14744/tjtes.2025.07724.

本文引用的文献

Assessing the Impact of ChatGPT in Dermatology: A Comprehensive Rapid Review.

J Clin Med. 2024 Oct 3;13(19):5909. doi: 10.3390/jcm13195909.

Comparison of ChatGPT versions in informing patients with rotator cuff injuries.

JSES Int. 2024 May 6;8(5):1016-1018. doi: 10.1016/j.jseint.2024.04.016. eCollection 2024 Sep.

Response Generated by Large Language Models Depends on the Structure of the Prompt.

Indian J Radiol Imaging. 2024 Mar 25;34(3):574-575. doi: 10.1055/s-0044-1782165. eCollection 2024 Jul.

Pediatric dermatologists versus AI bots: Evaluating the medical knowledge and diagnostic capabilities of ChatGPT.

Pediatr Dermatol. 2024 Sep-Oct;41(5):831-834. doi: 10.1111/pde.15649. Epub 2024 May 9.

Utilization of ChatGPT in Medical Education: Applications and Implications for Curriculum Enhancement.

Acta Inform Med. 2023;31(4):300-305. doi: 10.5455/aim.2023.31.300-305.

Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.

ChatGPT in dermatology: exploring the limited utility amidst the tech hype.

Front Med (Lausanne). 2024 Jan 11;10:1308229. doi: 10.3389/fmed.2023.1308229. eCollection 2023.

Assessing ChatGPT's Proficiency in Simplifying Radiological Reports for Healthcare Professionals and Patients.

Cureus. 2023 Dec 21;15(12):e50881. doi: 10.7759/cureus.50881. eCollection 2023 Dec.

Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.

J Am Acad Orthop Surg. 2023 Dec 1;31(23):1173-1179. doi: 10.5435/JAAOS-D-23-00396. Epub 2023 Sep 4.

ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology.

Clin Exp Dermatol. 2024 Jun 25;49(7):686-691. doi: 10.1093/ced/llad255.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能在皮肤科考试中的表现：ChatGPT的考试成效与局限

The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVES

METHOD

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献