Attanasio Margherita, Mazza Monica, Le Donne Ilenia, Masedu Francesco, Greco Maria Paola, Valenti Marco
Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, L'Aquila, Italy.
Reference Regional Centre for Autism, Abruzzo Region, Local Health Unit, L'Aquila, Italy.
Front Psychol. 2024 Oct 29;15:1488172. doi: 10.3389/fpsyg.2024.1488172. eCollection 2024.
In recent years, the capabilities of Large Language Models (LLMs), such as ChatGPT, to imitate human behavioral patterns have been attracting growing interest from experimental psychology. Although ChatGPT can successfully generate accurate theoretical and inferential information in several fields, its ability to exhibit a Theory of Mind (ToM) is a topic of debate and interest in literature. Impairments in ToM are considered responsible for social difficulties in many clinical conditions, such as Autism Spectrum Disorder (ASD). Some studies showed that ChatGPT can successfully pass classical ToM tasks, however, the response style used by LLMs to solve advanced ToM tasks, comparing their abilities with those of typical development (TD) individuals and clinical populations, has not been explored. In this preliminary study, we administered the Advanced ToM Test and the Emotion Attribution Task to ChatGPT 3.5 and ChatGPT-4 and compared their responses with those of an ASD and TD group. Our results showed that the two LLMs had higher accuracy in understanding mental states, although ChatGPT-3.5 failed with more complex mental states. In understanding emotional states, ChatGPT-3.5 performed significantly worse than TDs but did not differ from ASDs, showing difficulty with negative emotions. ChatGPT-4 achieved higher accuracy, but difficulties with recognizing sadness and anger persisted. The style adopted by both LLMs appeared verbose, and repetitive, tending to violate Grice's maxims. This conversational style seems similar to that adopted by high-functioning ASDs. Clinical implications and potential applications are discussed.
近年来,诸如ChatGPT之类的大语言模型(LLMs)模仿人类行为模式的能力引起了实验心理学越来越多的关注。尽管ChatGPT能够在多个领域成功生成准确的理论和推理信息,但其展现心理理论(ToM)的能力在文献中仍是一个有争议且备受关注的话题。心理理论受损被认为是导致许多临床病症(如自闭症谱系障碍,ASD)出现社交困难的原因。一些研究表明ChatGPT能够成功通过经典的心理理论任务,然而,大语言模型用于解决高级心理理论任务的反应方式,以及将它们的能力与典型发育(TD)个体和临床群体的能力进行比较,尚未得到探索。在这项初步研究中,我们对ChatGPT 3.5和ChatGPT-4进行了高级心理理论测试和情感归因任务,并将它们的反应与自闭症谱系障碍组和典型发育组的反应进行了比较。我们的结果表明,尽管ChatGPT-3.5在处理更复杂的心理状态时失败了,但这两个大语言模型在理解心理状态方面具有更高的准确性。在理解情绪状态方面,ChatGPT-3.5的表现明显比典型发育个体差,但与自闭症谱系障碍患者没有差异,在处理负面情绪方面存在困难。ChatGPT-4取得了更高的准确性,但在识别悲伤和愤怒方面仍存在困难。这两个大语言模型采用的风格显得冗长且重复,倾向于违反格赖斯准则。这种对话风格似乎与高功能自闭症患者采用的风格相似。我们还讨论了临床意义和潜在应用。