Vzorin Gleb D, Bukinich Alexey M, Sedykh Anna V, Vetrova Irina I, Sergienko Elena A
Lomonosov Moscow State University, Russia.
Institute of Psychology of Russian Academy of Sciences, Moscow, Russia.
Psychol Russ. 2024 Jun 15;17(2):85-99. doi: 10.11621/pir.2024.0206. eCollection 2024.
Advanced AI models such as the large language model GPT-4 demonstrate sophisticated intellectual capabilities, sometimes exceeding human intellectual performance. However, the emotional competency of these models, along with their underlying mechanisms, has not been sufficiently evaluated.
Our research aimed to explore different emotional intelligence domains in GPT-4 according to the Mayer-Salovey-Caruso model. We also tried to find out whether GPT-4's answer accuracy is consistent with its explanation of the answer.
The Russian version of the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) sections was used in this research, with questions asked as text prompts in separate, independent ChatGPT chats three times each.
High scores were achieved by the GPT-4 Large Language Model on the Understanding Emotions scale (with scores of 117, 124, and 128 across the three runs) and the Strategic Emotional Intelligence scale (with scores of 118, 121, and 122). Average scores were obtained on the Managing Emotions scale (103, 108, and 110 points). However, the Using Emotions to Facilitate Thought scale yielded low and less reliable scores (85, 86, and 88 points). Four types of explanations for the answer choices were identified: Meaningless sentences; Relation declaration; Implicit logic; and Explicit logic. Correct answers were accompanied by all types of explanations, whereas incorrect answers were only followed by Meaningless sentences or Explicit logic. This distribution aligns with observed patterns in children when they explore and elucidate mental states.
GPT-4 is capable of emotion identification and managing emotions, but it lacks deep reflexive analysis of emotional experience and the motivational aspect of emotions.
诸如大语言模型GPT-4之类的先进人工智能模型展现出了复杂的智力能力,有时超越了人类的智力表现。然而,这些模型的情绪能力及其潜在机制尚未得到充分评估。
我们的研究旨在根据梅耶-萨洛维-卡鲁索模型探索GPT-4在不同情商领域的表现。我们还试图找出GPT-4的答案准确性与其对答案的解释是否一致。
本研究使用了梅耶-萨洛维-卡鲁索情商测试(MSCEIT)俄文版的部分内容,在独立的ChatGPT聊天中以文本提示的形式分别三次提出问题。
GPT-4大语言模型在理解情绪量表(三次测试得分分别为117、124和128)和策略性情商量表(得分分别为118、121和122)上取得了高分。在管理情绪量表上获得了平均分(103、108和110分)。然而,利用情绪促进思考量表的得分较低且可靠性较差(85、86和88分)。确定了对答案选项的四种解释类型:无意义的句子;关系声明;隐含逻辑;以及明确逻辑。正确答案伴随着所有类型的解释,而错误答案仅伴随着无意义的句子或明确逻辑。这种分布与儿童在探索和阐明心理状态时观察到的模式一致。
GPT-4能够识别情绪并管理情绪,但它缺乏对情绪体验的深度反思分析和情绪的动机方面。