Kalam Kazi A, Masoud Fadi D, Muntaser Adam, Ranga Raghav, Geng Xue, Goyal Munish
Department of Medicine, Georgetown University School of Medicine, Washington, DC, USA.
Department of Emergency Medicine, Wayne State University, Detroit, USA.
Cureus. 2025 Jun 11;17(6):e85767. doi: 10.7759/cureus.85767. eCollection 2025 Jun.
As artificial intelligence (AI) tools like ChatGPT become increasingly widespread in medical education, it is essential to evaluate the effectiveness of AI tools in enhancing students' academic performance and retention compared to traditional educational resources such as lecture materials and textbooks.
We aim to determine whether ChatGPT-4.0 improves medical students' short-term academic performance compared to internal institutional resources and publicly available online materials.
This study was a single-center, prospective, randomized controlled trial conducted over a two-week period in April 2025.
The research took place at Georgetown University School of Medicine.
A total of 198 first-year medical students were invited to participate, with 33 students enrolling in the study. Participants (N = 33) were assigned to one of three groups: Group A (ChatGPT-4.0), Group B (external resources, including publicly available online materials such as Google, PubMed, and third-party educational websites, but excluding AI-assisted tools), and Group C (institutional resources such as electronic textbooks and lecture materials).
Individuals (N = 33) were randomly assigned to one of three groups: Group A (ChatGPT-4.0; N = 10, 30.3%), Group B (external resources; N = 12, 36.4%), and Group C (institutional resources; N = 11, 33.3%). Participants completed an initial multiple-choice quiz using their assigned resources, followed by a post-quiz survey. One week later, they retook the same quiz without access to any resources to assess retention. Primary outcome(s) and measure(s): The primary outcome was the initial quiz score. The secondary outcome was the retention score, evaluated through the second quiz without resources.
Initial quiz scores (Week 1) were significantly higher in Group A (N = 10, mean = 9.60 ± 0.52) and Group B (N = 12, mean = 9.08 ± 0.79) compared to Group C (N = 11, mean = 6.64 ± 1.57) (p < 0.001). However, retention scores one week later (Week 2) showed no significant differences among the groups: Group A (N = 10, mean = 6.20 ± 1.93), Group B (N = 12, mean = 5.58 ± 2.07), and Group C (N = 11, mean = 4.36 ± 2.01) (p = 0.118).
ChatGPT-4.0 improves short-term academic performance but does not provide a short-term retention advantage over institutional or external online educational resources. These findings demonstrate the potential of AI tools to enhance short-term learning outcomes while emphasizing the need for further research to evaluate their long-term effectiveness in educational settings.
随着ChatGPT等人工智能(AI)工具在医学教育中越来越广泛地应用,与传统教育资源(如讲义材料和教科书)相比,评估AI工具在提高学生学业成绩和知识保留方面的有效性至关重要。
我们旨在确定与内部机构资源和公开可用的在线材料相比,ChatGPT-4.0是否能提高医学生的短期学业成绩。
本研究是一项单中心、前瞻性、随机对照试验,于2025年4月进行了为期两周的研究。
研究在乔治敦大学医学院进行。
共邀请了198名一年级医学生参与,33名学生报名参加研究。参与者(N = 33)被分为三组之一:A组(ChatGPT-4.0)、B组(外部资源,包括谷歌、PubMed和第三方教育网站等公开可用的在线材料,但不包括AI辅助工具)和C组(机构资源,如电子教科书和讲义材料)。
个体(N = 33)被随机分配到三组之一:A组(ChatGPT-4.0;N = 10,30.3%)、B组(外部资源;N = 12,36.4%)和C组(机构资源;N = 11,33.3%)。参与者使用分配的资源完成初始多项选择题测验,随后进行测验后调查。一周后,他们在没有任何资源的情况下重新进行相同的测验以评估知识保留情况。主要结局和指标:主要结局是初始测验分数。次要结局是保留分数,通过无资源的第二次测验进行评估。
与C组(N = 11,均值 = 6.64 ±