Suppr超能文献

人工智能在皮肤科考试中的表现:ChatGPT的考试成效与局限

The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT.

作者信息

Göçer Gürok Neşe, Öztürk Savaş

机构信息

Department of Dermatology, Elazığ Fethi Sekin City Health Application and Research Center, University of Health Sciences, Elazig, Turkey.

出版信息

J Cosmet Dermatol. 2025 May;24(5):e70244. doi: 10.1111/jocd.70244.

Abstract

BACKGROUND

Artificial intelligence holds significant potential in dermatology.

OBJECTIVES

This study aimed to explore the potential and limitations of artificial intelligence applications in dermatology education by evaluating ChatGPT's performance on questions from the dermatology residency exam.

METHOD

In this study, the dermatology residency exam results for ChatGPT versions 3.5 and 4.0 were compared with those of resident doctors across various seniority levels. Dermatology resident doctors were categorized into four seniority levels based on their education, and a total of 100 questions-25 multiple-choice questions for each seniority level-were included in the exam. The same questions were also administered to ChatGPT versions 3.5 and 4.0, and the scores were analyzed statistically.

RESULTS

ChatGPT 3.5 performed poorly, especially when compared to senior residents. Second (p = 0.038), third (p = 0.041), and fourth-year senior resident physicians (p = 0.020) scored significantly higher than ChatGPT 3.5. ChatGPT 4.0 showed similar performance compared to first- and third-year senior resident physicians, but performed worse in comparison to second (p = 0.037) and fourth-year senior resident physicians (p = 0.029). Both versions scored lower as seniority and exam difficulty increased. ChatGPT 3.5 passed the first and second-year exams but failed the third and fourth-year exams. ChatGPT 4.0 passed the first, second, and third-year exams but failed the fourth-year exam. These findings suggest that ChatGPT was not on par with senior resident physicians, particularly on topics requiring advanced knowledge; however, version 4.0 proved to be more effective than version 3.5.

CONCLUSION

In the future, as ChatGPT's language support and knowledge of medicine improve, it can be used more effectively in educational processes.

摘要

背景

人工智能在皮肤病学领域具有巨大潜力。

目的

本研究旨在通过评估ChatGPT在皮肤科住院医师考试问题上的表现,探讨人工智能在皮肤科教育中的潜力和局限性。

方法

在本研究中,将ChatGPT 3.5和4.0版本的皮肤科住院医师考试结果与不同资历水平的住院医生进行比较。皮肤科住院医生根据其学历被分为四个资历水平,考试共包含100道题——每个资历水平25道选择题。同样的问题也被用于测试ChatGPT 3.5和4.0版本,并对分数进行统计分析。

结果

ChatGPT 3.5表现不佳,尤其是与高年级住院医生相比。二年级(p = 0.038)、三年级(p = 0.041)和四年级住院医师(p = 0.020)的得分显著高于ChatGPT 3.5。ChatGPT 4.0与一年级和三年级住院医师相比表现相似,但与二年级(p = 0.037)和四年级住院医师相比表现较差(p = 0.029)。随着资历和考试难度的增加,两个版本的得分都降低。ChatGPT 3.5通过了一年级和二年级考试,但未通过三年级和四年级考试。ChatGPT 4.0通过了一年级、二年级和三年级考试,但未通过四年级考试。这些发现表明,ChatGPT与高年级住院医师相比还有差距,尤其是在需要高级知识的主题上;然而,4.0版本比3.5版本更有效。

结论

未来,随着ChatGPT的语言支持和医学知识的提高,它可以在教育过程中得到更有效的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd78/12087418/9a54062a4230/JOCD-24-e70244-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验