Santavirta Severi, Wu Yuhang, Suominen Lauri, Nummenmaa Lauri
Turku PET Centre, University of Turku, Turku, Finland.
Turku University Hospital, Turku, Finland.
Imaging Neurosci (Camb). 2025 Sep 2;3. doi: 10.1162/IMAG.a.134. eCollection 2025.
Humans navigate the social world by rapidly perceiving social features from other people and their interaction. Recently, large-language models (LLMs) have achieved high-level visual capabilities for detailed object and scene content recognition and description. This raises the question whether LLMs can infer complex social information from images and videos, and whether the high-dimensional structure of the feature annotations aligns with that of humans. We collected evaluations for 138 social features from GPT-4V for images (N = 468) and videos (N = 234) that are derived from social movie scenes. These evaluations were compared with human evaluations (N = 2,254). The comparisons established that GPT-4V can achieve human-like capabilities at annotating individual social features. The GPT-4V social feature annotations also express similar structural representation compared to the human social perceptual structure (i.e., similar correlation matrix over all social feature annotations). Finally, we modeled hemodynamic responses (N = 97) to viewing socioemotional movie clips with feature annotations by human observers and GPT-4V. These results demonstrated that GPT-4V based stimulus models can also reveal the social perceptual network in the human brain highly similar to the stimulus models based on human annotations. These human-like annotation capabilities of LLMs could have a wide range of real-life applications ranging from health care to business and would open exciting new avenues for psychological and neuroscientific research.
人类通过快速感知他人及其互动中的社会特征来在社会世界中导航。最近,大语言模型(LLMs)在详细的物体和场景内容识别与描述方面已经实现了高级视觉能力。这就引发了一个问题,即大语言模型是否能够从图像和视频中推断出复杂的社会信息,以及特征标注的高维结构是否与人类的结构一致。我们从来自社会电影场景的图像(N = 468)和视频(N = 234)中收集了针对GPT - 4V的138种社会特征的评估。这些评估与人类评估(N = 2254)进行了比较。比较结果表明,GPT - 4V在标注单个社会特征方面能够达到类似人类的能力。与人类社会感知结构相比,GPT - 4V的社会特征标注也表现出相似的结构表征(即所有社会特征标注上的相似相关矩阵)。最后,我们对人类观察者和GPT - 4V对带有特征标注的社会情感电影片段的观看进行了血液动力学反应建模(N = 97)。这些结果表明,基于GPT - 4V的刺激模型也能够揭示人类大脑中与基于人类标注的刺激模型高度相似的社会感知网络。大语言模型这些类似人类的标注能力在从医疗保健到商业等广泛的现实生活应用中都可能具有价值,并将为心理学和神经科学研究开辟令人兴奋的新途径。