Faculté de Médecine de Tunis, Université de Tunis El Manar, Tunis, Tunisia.
JMIR Med Educ. 2024 Jul 23;10:e52818. doi: 10.2196/52818.
BACKGROUND: The rapid evolution of ChatGPT has generated substantial interest and led to extensive discussions in both public and academic domains, particularly in the context of medical education. OBJECTIVE: This study aimed to evaluate ChatGPT's performance in a pulmonology examination through a comparative analysis with that of third-year medical students. METHODS: In this cross-sectional study, we conducted a comparative analysis with 2 distinct groups. The first group comprised 244 third-year medical students who had previously taken our institution's 2020 pulmonology examination, which was conducted in French. The second group involved ChatGPT-3.5 in 2 separate sets of conversations: without contextualization (V1) and with contextualization (V2). In both V1 and V2, ChatGPT received the same set of questions administered to the students. RESULTS: V1 demonstrated exceptional proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these domains. However, it faced challenges in pathology, pharmacology, and clinical pneumology. In contrast, V2 consistently delivered more accurate responses across various question categories, regardless of the specialization. ChatGPT exhibited suboptimal performance in multiple choice questions compared to medical students. V2 excelled in responding to structured open-ended questions. Both ChatGPT conversations, particularly V2, outperformed students in addressing questions of low and intermediate difficulty. Interestingly, students showcased enhanced proficiency when confronted with highly challenging questions. V1 fell short of passing the examination. Conversely, V2 successfully achieved examination success, outperforming 139 (62.1%) medical students. CONCLUSIONS: While ChatGPT has access to a comprehensive web-based data set, its performance closely mirrors that of an average medical student. Outcomes are influenced by question format, item complexity, and contextual nuances. The model faces challenges in medical contexts requiring information synthesis, advanced analytical aptitude, and clinical judgment, as well as in non-English language assessments and when confronted with data outside mainstream internet sources.
背景:ChatGPT 的快速发展引起了公众和学术界的广泛关注和讨论,尤其是在医学教育领域。
目的:通过与三年级医学生的比较分析,评估 ChatGPT 在肺病学考试中的表现。
方法:在这项横断面研究中,我们对两个不同的组进行了比较分析。第一组包括 244 名三年级医学生,他们之前参加过我们机构 2020 年的法语肺病学考试。第二组包括 ChatGPT-3.5,在两组独立的对话中:无上下文(V1)和有上下文(V2)。在 V1 和 V2 中,ChatGPT 都收到了与学生相同的问题集。
结果:V1 在放射学、微生物学和胸外科方面表现出色,在这些领域超过了大多数医学生。然而,它在病理学、药理学和临床肺科学方面遇到了挑战。相比之下,V2 在各个问题类别中始终提供更准确的回答,无论专业如何。与医学生相比,ChatGPT 在多项选择题中的表现不佳。V2 在回答结构化的开放式问题方面表现出色。ChatGPT 的两个对话,尤其是 V2,在回答低难度和中等难度的问题方面表现优于学生。有趣的是,学生在面对高难度问题时表现出更高的熟练度。V1 未能通过考试。相反,V2 成功通过了考试,超过了 139 名(62.1%)医学生。
结论:虽然 ChatGPT 可以访问全面的基于网络的数据集,但它的表现与平均医学生非常相似。结果受到问题格式、项目复杂性和上下文细微差别的影响。该模型在需要信息综合、高级分析能力和临床判断的医学背景下,以及在非英语语言评估和遇到主流互联网来源之外的数据时,都面临挑战。
J Med Internet Res. 2024-8-20
Front Med (Lausanne). 2023-12-13
Swiss Dent J. 2023-10-4
J Med Syst. 2023-8-15
J Chin Med Assoc. 2023-8-1
Eur Heart J Digit Health. 2023-4-24
Eur J Hum Genet. 2024-4
J Chin Med Assoc. 2023-7-1