Department of Psychology, Princeton University, Princeton, USA.
Department of Computer Science, Princeton University, Princeton, USA.
Sci Rep. 2024 Sep 13;14(1):21445. doi: 10.1038/s41598-024-72071-1.
Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality, and provides highly correlated predictions with human data irrespective of whether direct visual input is provided or purely textual descriptors. To study the impact of specific languages, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
确定语言能够在多大程度上还原感知世界是哲学和认知科学中长期存在的问题。我们表明,最先进的大型语言模型可以通过提供从语言中提取的感知信息量的下限,为解决这个问题提供新的见解。具体来说,我们在六个心理物理数据集上从 GPT 模型中引出了成对的相似性判断。我们发现,这些判断与所有领域的人类数据具有显著相关性,能够很好地再现颜色轮和音高螺旋等知名表示。令人惊讶的是,我们发现一个经过视觉和语言共同训练的模型(GPT-4)并不一定会提高特定于视觉的模态的性能,并且无论是否提供直接的视觉输入或仅提供文本描述,它都能提供与人类数据高度相关的预测。为了研究特定语言的影响,我们还将模型应用于多语言颜色命名任务。我们发现 GPT-4 再现了英语和俄语中的跨语言变化,揭示了语言和感知的相互作用。