Wraith Constance, Carnegy Alasdair, Brown Celia, Baptista Ana, Sam Amir H
Imperial College School of Medicine, London, UK.
Med Educ. 2025 Jul 2. doi: 10.1111/medu.15750.
Reflection is integral to the modern doctor's practice and, whilst it can take many forms, written reflection is commonly found on medical school curricula. Generative artificial intelligence (GenAI) is increasingly being used, including in the completion of written assignments in medical curricula. We sought to explore if educators can distinguish between GenAI- and student-authored reflections and what features they use to do so.
This was a mixed-methods study. Twenty-eight educators attended a 'think aloud' interview and were presented with a set of four reflections, either all authored by students, all by GenAI or a mixture. They were asked to identify who they thought had written the reflection, speaking aloud whilst they did so. Sensitivity (AI reflections correctly identified) and specificity (student reflections correctly identified) were then calculated, and the interview transcripts were analysed using thematic analysis.
Educators were unable to reliably distinguish between student and GenAI-authored reflections. Sensitivity across the four reflections ranged from 0.36 (95% CI: 0.16-0.61) to 0.64 (95% CI: 0.39-0.84). Specificity ranged from 0.64 (95% CI: 0.39-0.84) to 0.86 (95% CI: 0.60-0.96). Thematic analysis revealed three main themes when considering what features of the reflection educators used to make judgements about authorship: features of writing, features of reflection and educators' preconceptions and experiences.
This study demonstrates the challenges in differentiating between student- and GenAI-authored reflections, as well as highlighting the range of factors that influence this decision. Rather than developing ways to more accurately make this distinction or trying to stop students using GenAI, we suggest it could instead be harnessed to teach students reflective practice skills, and help students for whom written reflection in particular may be challenging.
反思是现代医生实践中不可或缺的一部分,虽然它可以有多种形式,但书面反思在医学院课程中很常见。生成式人工智能(GenAI)越来越多地被使用,包括在医学课程书面作业的完成中。我们试图探究教育工作者是否能够区分由GenAI撰写的反思和学生撰写的反思,以及他们用以区分的特征。
这是一项混合方法研究。28名教育工作者参加了一次“出声思考”访谈,并被展示一组四篇反思,这些反思要么全部由学生撰写,要么全部由GenAI撰写,要么是混合的。他们被要求指出他们认为是谁写了这些反思,并在进行判断时大声说出来。然后计算敏感性(正确识别的由AI撰写的反思)和特异性(正确识别的学生撰写的反思),并使用主题分析对访谈记录进行分析。
教育工作者无法可靠地区分学生撰写的反思和由GenAI撰写的反思。四篇反思中的敏感性范围从0.36(95%置信区间:0.16 - 0.61)到0.64(95%置信区间:0.39 - 0.84)。特异性范围从0.64(95%置信区间:0.39 - 0.84)到0.86(95%置信区间:0.60 - 0.96)。主题分析揭示了在考虑教育工作者用于判断作者身份的反思特征时的三个主要主题:写作特征、反思特征以及教育工作者的先入之见和经验。
本研究展示了区分学生撰写的反思和由GenAI撰写的反思所面临的挑战,同时突出了影响这一判断的一系列因素。我们建议,与其开发更准确进行这种区分的方法或试图阻止学生使用GenAI,不如利用它来教授学生反思实践技能,并帮助那些书面反思可能特别具有挑战性的学生。