人类与多模态语言模型之间从视频片段中引发的高维情感结构的对应关系。

Correspondence of high dimensional emotion structures elicited from video clips between humans and multimodal LLMs.

作者信息

Asanuma Haruka, Koide-Majima Naoko, Nakamura Ken, Horii Takato, Nishimoto Shinji, Oizumi Masafumi

机构信息

Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, 153-8902, Japan.

Center for Information and Neural Networks (CiNet), National Institute of Information and Communications Technology, Osaka, 565-0871, Japan.

出版信息

Sci Rep. 2025 Sep 1;15(1):32175. doi: 10.1038/s41598-025-14961-6.

DOI:10.1038/s41598-025-14961-6

PMID:40890212

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12402258/

Abstract

Recent studies have revealed that human emotions exhibit a high-dimensional, complex structure. A full capturing of this complexity requires new approaches, as conventional models that disregard high dimensionality risk overlooking key nuances of human emotions. Here, we examined the extent to which the latest generation of rapidly evolving Multimodal Large Language Models (MLLMs) capture these high-dimensional, intricate emotion structures, including capabilities and limitations. Specifically, we compared self-reported emotion ratings from participants watching videos with model-generated estimates (e.g., Gemini or GPT). We evaluated performance not only at the individual video level but also from emotion structures that account for inter-video relationships. At the level of simple correlation between emotion structures, our results demonstrated strong similarity between human and model-inferred emotion structures. To further explore whether the similarity between humans and models is at the signle-item level or the coarse-category level, we applied Gromov-Wasserstein Optimal Transport. We found that although performance was not necessarily high at the strict, single-item level, performance across video categories that elicit similar emotions was substantial, indicating that the model could infer human emotional experiences at the coarse-category level. Our results suggest that current state-of-the-art MLLMs broadly capture the complex high-dimensional emotion structures at the coarse-category level, as well as their apparent limitations in accurately capturing entire structures at the single-item level.

摘要

最近的研究表明，人类情感呈现出高维度、复杂的结构。要全面捕捉这种复杂性，需要新的方法，因为忽视高维度的传统模型可能会忽略人类情感的关键细微差别。在这里，我们研究了最新一代快速发展的多模态大语言模型（MLLMs）在多大程度上捕捉到了这些高维度、复杂的情感结构，包括其能力和局限性。具体来说，我们将观看视频的参与者的自我报告情感评分与模型生成的估计值（如Gemini或GPT）进行了比较。我们不仅在单个视频层面评估了性能，还从考虑视频间关系的情感结构层面进行了评估。在情感结构的简单相关性层面，我们的结果表明人类和模型推断的情感结构之间有很强的相似性。为了进一步探究人类与模型之间的相似性是在单项层面还是粗略类别层面，我们应用了格罗莫夫-瓦瑟斯坦最优传输。我们发现，虽然在严格的单项层面性能不一定高，但在引发相似情感的视频类别中的性能相当可观，这表明模型能够在粗略类别层面推断人类的情感体验。我们的结果表明，当前最先进的MLLMs在粗略类别层面广泛捕捉到了复杂的高维度情感结构，以及它们在准确捕捉单项层面的整个结构方面明显的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df0c/12402258/f3c77b9678f0/41598_2025_14961_Fig1_HTML.jpg

相似文献

Correspondence of high dimensional emotion structures elicited from video clips between humans and multimodal LLMs.人类与多模态语言模型之间从视频片段中引发的高维情感结构的对应关系。

Sci Rep. 2025 Sep 1;15(1):32175. doi: 10.1038/s41598-025-14961-6.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

An Examination of Perceived Stress and Emotion Regulation Challenges as Mediators of Associations Between Camouflaging and Internalizing Symptomatology.作为伪装与内化症状学之间关联的中介因素的感知压力和情绪调节挑战的考察

Autism Adulthood. 2024 Sep 16;6(3):345-361. doi: 10.1089/aut.2022.0121. eCollection 2024 Sep.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Facial emotion recognition accuracy in women with symptoms of polycystic ovary syndrome: Reduced fear and disgust perception.患有多囊卵巢综合征症状女性的面部情绪识别准确性：恐惧和厌恶感知能力降低。

Womens Health (Lond). 2025 Jan-Dec;21:17455057251359761. doi: 10.1177/17455057251359761. Epub 2025 Jul 28.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状荟萃分析。

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验：定性证据综合。

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

Unsupervised alignment reveals structural commonalities and differences in neural representations of natural scenes across individuals and brain areas.无监督对齐揭示了个体和脑区之间自然场景神经表征的结构共性与差异。

iScience. 2025 Apr 15;28(5):112427. doi: 10.1016/j.isci.2025.112427. eCollection 2025 May 16.

Unsupervised alignment in neuroscience: Introducing a toolbox for Gromov-Wasserstein optimal transport.神经科学中的无监督对齐：引入用于格罗莫夫 - 瓦瑟斯坦最优传输的工具箱。

J Neurosci Methods. 2025 Jul;419:110443. doi: 10.1016/j.jneumeth.2025.110443. Epub 2025 Apr 14.

Is my "red" your "red"?: Evaluating structural correspondences between color similarity judgments using unsupervised alignment.我的“红色”是你的“红色”吗？：使用无监督对齐评估颜色相似性判断之间的结构对应关系。

iScience. 2025 Feb 15;28(3):112029. doi: 10.1016/j.isci.2025.112029. eCollection 2025 Mar 21.

Large language models predict human sensory judgments across six modalities.大型语言模型可预测人类在六种感觉模式下的判断。

Sci Rep. 2024 Sep 13;14(1):21445. doi: 10.1038/s41598-024-72071-1.

Gromov-Wasserstein unsupervised alignment reveals structural correspondences between the color similarity structures of humans and large language models.无监督的 Gromov-Wasserstein 对齐揭示了人类和大型语言模型的颜色相似性结构之间的结构对应关系。

Sci Rep. 2024 Jul 10;14(1):15917. doi: 10.1038/s41598-024-65604-1.

Emotion Regulation Convoys: Individual and Age Differences in the Hierarchical Configuration of Emotion Regulation Behaviors in Everyday Life.情绪调节群组：日常生活中情绪调节行为分层结构的个体差异与年龄差异

Affect Sci. 2023 Dec 16;4(4):630-643. doi: 10.1007/s42761-023-00228-8. eCollection 2023 Dec.

Holistic Evaluation of Language Models.语言模型的整体评估。

Ann N Y Acad Sci. 2023 Jul;1525(1):140-146. doi: 10.1111/nyas.15007. Epub 2023 May 25.

Automated emotion recognition: Current trends and future perspectives.自动化情感识别：当前趋势和未来展望。

Comput Methods Programs Biomed. 2022 Mar;215:106646. doi: 10.1016/j.cmpb.2022.106646. Epub 2022 Jan 19.

Distinct dimensions of emotion in the human brain and their representation on the cortical surface.人类大脑中情绪的不同维度及其在皮质表面的表现。

Neuroimage. 2020 Nov 15;222:117258. doi: 10.1016/j.neuroimage.2020.117258. Epub 2020 Aug 13.

Emotional Expression: Advances in Basic Emotion Theory.情绪表达：基本情绪理论的进展

J Nonverbal Behav. 2019 Jun;43(2):133-160. doi: 10.1007/s10919-019-00293-3. Epub 2019 Feb 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人类与多模态语言模型之间从视频片段中引发的高维情感结构的对应关系。

Correspondence of high dimensional emotion structures elicited from video clips between humans and multimodal LLMs.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献