Qian Carolyn, Gao Christina, Park Sang-O, Gim Haelynn, Hou Kelly, Cook Benjamin, Le Jasmin, Stretton Brandon, Maddison John, McCoy Liam, Goh Rudy, Arnold Matthew, Reda Haatem, Kaplan Tamara, Gheihman Galina, Bacchi Stephen
Harvard Medical School, Harvard University, Boston, MA 02138 USA.
Adelaide Medical School, The University of Adelaide, Adelaide, SA 5005 Australia.
Med Sci Educ. 2025 Feb 28;35(3):1169-1171. doi: 10.1007/s40670-025-02343-6. eCollection 2025 Jun.
Large language models (LLMs) may be able to deliver interactive case-based content and score student interactions with such cases. In this study, GPT-4o demonstrated a high correlation with expert scorers in the evaluation of medical students' interactions with cases. A difference between LLM scores and expert scorers was corrected through calibration.
The online version contains supplementary material available at 10.1007/s40670-025-02343-6.
大语言模型(LLMs)或许能够提供基于案例的交互式内容,并对学生与此类案例的互动进行评分。在本研究中,GPT-4o在评估医学生与案例的互动时,与专家评分者表现出高度相关性。通过校准纠正了大语言模型分数与专家评分者之间的差异。
在线版本包含可在10.1007/s40670-025-02343-6获取的补充材料。