Levkovich Inbar, Rabin Eyal, Farraj Rania Hussein, Elyoseph Zohar
Faculty of Education, Tel Hai College, Upper Galilee, Israel.
Department of Psychology and Education, The Open University of Israel, Israel.
Res Dev Disabil. 2025 May;160:104970. doi: 10.1016/j.ridd.2025.104970. Epub 2025 Mar 15.
This study explored differences in the attributional patterns of four advanced artificial intelligence (AI) Large Language Models (LLMs): ChatGPT3.5, ChatGPT4, Claude, and Gemini) by focusing on feedback, frustration, sympathy, and expectations of future failure among students with and without learning disabilities (LD). These findings were compared with responses from a sample of Australian and Chinese trainee teachers, comprising individuals nearing qualification with varied demographic and educational backgrounds. Eight vignettes depicting students with varying abilities and efforts were evaluated by the LLMs ten times each, resulting in 320 evaluations, with trainee teachers providing comparable ratings. For LD students, the LLMs exhibited lower frustration and higher sympathy than trainee teachers, while for non-LD students, LLMs similarly showed lower frustration, with ChatGPT3.5 aligning closely with Chinese teachers and ChatGPT4 demonstrating more sympathy than both teacher groups. Notably, LLMs expressed lower expectations of future academic failure for both LD and non-LD students compared to trainee teachers. Regarding feedback, the findings reflect ratings of the qualitative nature of feedback LLMs and teachers would provide, rather than actual feedback text. The LLMs, particularly ChatGPT3.5 and Gemini, were rated as providing more negative feedback than trainee teachers, while ChatGPT4 provided more positive ratings for both LD and non-LD students, aligning with Chinese teachers in some cases. These findings suggest that LLMs may promote a positive and inclusive outlook for LD students by exhibiting lower judgmental tendencies and higher optimism. However, their tendency to rate feedback more negatively than trainee teachers highlights the need to recalibrate AI tools to better align with cultural and emotional nuances.
本研究通过关注有学习障碍(LD)和无学习障碍学生的反馈、挫折感、同情心以及对未来失败的预期,探讨了四种先进的人工智能(AI)大语言模型(LLMs):ChatGPT3.5、ChatGPT4、Claude和Gemini在归因模式上的差异。这些发现与来自澳大利亚和中国实习教师样本的回答进行了比较,这些实习教师即将获得资格,具有不同的人口统计学和教育背景。八个描述不同能力和努力程度学生的 vignettes 由大语言模型分别评估了十次,共得到320次评估,实习教师也给出了可比的评分。对于有学习障碍的学生,大语言模型表现出比实习教师更低的挫折感和更高的同情心,而对于无学习障碍的学生,大语言模型同样表现出更低的挫折感,ChatGPT3.5与中国教师的评分非常接近,ChatGPT4表现出比两组教师都更多的同情心。值得注意的是,与实习教师相比,大语言模型对有学习障碍和无学习障碍学生未来学业失败的预期都更低。关于反馈,研究结果反映的是大语言模型和教师会提供的反馈的定性评分,而非实际的反馈文本。大语言模型,特别是ChatGPT3.5和Gemini,被评为比实习教师提供更多负面反馈,而ChatGPT4对有学习障碍和无学习障碍学生都给出了更积极的评分,在某些情况下与中国教师的评分一致。这些发现表明,大语言模型可能通过表现出更低的评判倾向和更高的乐观态度,为有学习障碍的学生营造积极和包容的前景。然而,它们对反馈的评分比实习教师更负面的倾向凸显了重新校准人工智能工具以更好地适应文化和情感细微差别的必要性。