López-Pérez Belén, Chen Yuhui, Li Xiuhui, Cheng Shixing, Razavi Pooya
School of Health Sciences, University of Manchester.
Department of Psychology, University of Oregon.
Emotion. 2025 Apr 17. doi: 10.1037/emo0001528.
Interpersonal emotion regulation involves using diverse strategies to influence others' emotions, commonly assessed with questionnaires. However, this method may be less effective for individuals with limited literacy or introspection skills. To address this, recent studies have adopted narrative-based approaches, though these require time-intensive qualitative analysis. Given the potential of artificial intelligence (AI) and large language models (LLM) for information classification, we evaluated the feasibility of using AI to categorize interpersonal emotion regulation strategies. We conducted two studies in which we compared AI performance against human coding in identifying regulation strategies from narrative data. In Study 1, with 2,824 responses, ChatGPT initially achieved Kappa values over .47. Refinements in prompts (i.e., coding instructions) led to improved consistency between ChatGPT and human coders (κ > .79). In Study 2, the refined prompts demonstrated comparable accuracy (κ > .76) when analyzing a new set of responses ( = 2090), using both ChatGPT and Claude. Additional evaluations of LLMs' performance using different accuracy metrics pointed to notable variability in LLM's capability when interpreting narratives across different emotions and regulatory strategies. These results point to the strengths and limitations of LLMs in classifying regulation strategies, and the importance of prompt engineering and validation. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
人际情绪调节涉及使用多种策略来影响他人的情绪,通常通过问卷调查进行评估。然而,这种方法对于识字能力或内省能力有限的个体可能效果较差。为了解决这个问题,最近的研究采用了基于叙事的方法,尽管这些方法需要耗时的定性分析。鉴于人工智能(AI)和大语言模型(LLM)在信息分类方面的潜力,我们评估了使用AI对人际情绪调节策略进行分类的可行性。我们进行了两项研究,在从叙事数据中识别调节策略时,将AI的表现与人工编码进行了比较。在研究1中,ChatGPT对2824份回复最初获得的卡帕值超过0.47。对提示(即编码指令)进行优化后,ChatGPT与人工编码员之间的一致性得到了提高(κ>.79)。在研究2中,当使用ChatGPT和Claude分析一组新的回复(n = 2090)时,经过优化的提示显示出相当的准确性(κ>.76)。使用不同准确性指标对大语言模型的表现进行的额外评估表明,在解释不同情绪和调节策略的叙事时,大语言模型的能力存在显著差异。这些结果指出了大语言模型在分类调节策略方面的优势和局限性,以及提示工程和验证的重要性。(《心理学文摘数据库记录》(c)2025美国心理学会,保留所有权利)