Çiçek Feray Ekin, Ülker Müşerref, Özer Menekşe, Kıyak Yavuz Selim
Faculty of Medicine, Gazi University, Ankara 06500, Turkiye.
Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara 06500, Turkiye.
Postgrad Med J. 2025 Apr 22;101(1195):458-463. doi: 10.1093/postmj/qgae170.
To evaluate the effectiveness of ChatGPT-generated feedback compared to expert-written feedback in improving clinical reasoning skills among first-year medical students.
This is a randomized controlled trial conducted at a single medical school and involved 129 first-year medical students who were randomly assigned to two groups. Both groups completed three formative tests with feedback on urinary tract infections (UTIs; uncomplicated, complicated, pyelonephritis) over five consecutive days as a spaced repetition, receiving either expert-written feedback (control, n = 65) or ChatGPT-generated feedback (experiment, n = 64). Clinical reasoning skills were assessed using Key-Features Questions (KFQs) immediately after the intervention and 10 days later. Students' critical approach to artificial intelligence (AI) was also measured before and after disclosing the AI involvement in feedback generation.
There was no significant difference between the mean scores of the control (immediate: 78.5 ± 20.6 delayed: 78.0 ± 21.2) and experiment (immediate: 74.7 ± 15.1, delayed: 76.0 ± 14.5) groups in overall performance on Key-Features Questions (out of 120 points) immediately (P = .26) or after 10 days (P = .57), with small effect sizes. However, the control group outperformed the ChatGPT group in complicated urinary tract infection cases (P < .001). The experiment group showed a significantly higher critical approach to AI after disclosing, with medium-large effect sizes.
ChatGPT-generated feedback can be an effective alternative to expert feedback in improving clinical reasoning skills in medical students, particularly in resource-constrained settings with limited expert availability. However, AI-generated feedback may lack the nuance needed for more complex cases, emphasizing the need for expert review. Additionally, exposure to the drawbacks in AI-generated feedback can enhance students' critical approach towards AI-generated educational content.
评估与专家撰写的反馈相比,ChatGPT生成的反馈在提高一年级医学生临床推理技能方面的有效性。
这是一项在一所医学院进行的随机对照试验,涉及129名一年级医学生,他们被随机分为两组。两组学生连续五天完成三项关于尿路感染(UTIs;单纯性、复杂性、肾盂肾炎)的形成性测试,并收到反馈,作为间隔重复学习,一组接受专家撰写的反馈(对照组,n = 65),另一组接受ChatGPT生成的反馈(实验组,n = 64)。干预后立即以及10天后,使用关键特征问题(KFQs)评估临床推理技能。在披露AI参与反馈生成之前和之后,还测量了学生对人工智能(AI)的批判性态度。
在关键特征问题的总体表现(满分120分)上,对照组(干预后立即:78.5±20.6,延迟:78.0±21.2)和实验组(干预后立即:74.7±15.1,延迟:76.0±14.5)的平均得分在干预后立即(P = 0.26)或10天后(P = 0.57)没有显著差异,效应量较小。然而,在复杂性尿路感染病例中,对照组的表现优于ChatGPT组(P < 0.001)。披露后,实验组对AI的批判性态度显著更高,效应量为中到大。
ChatGPT生成的反馈在提高医学生临床推理技能方面可以成为专家反馈的有效替代方案,特别是在专家资源有限的资源受限环境中。然而,AI生成的反馈可能缺乏处理更复杂病例所需的细微差别,这强调了专家审核的必要性。此外,接触AI生成反馈中的缺点可以增强学生对AI生成的教育内容的批判性态度。