Xu Qihui, Peng Yingying, Nastase Samuel A, Chodorow Martin, Wu Minghua, Li Ping
Department of Psychology, Ohio State University, Columbus, OH, USA.
Department of Chinese and Bilingual Studies, Faculty of Humanities, The Hong Kong Polytechnic University, Hong Kong SAR, China.
Nat Hum Behav. 2025 Jun 4. doi: 10.1038/s41562-025-02203-8.
To what extent can language give rise to complex conceptual representation? Is multisensory experience essential? Recent large language models (LLMs) challenge the necessity of grounding for concept formation: whether LLMs without grounding nevertheless exhibit human-like representations. Here we compare multidimensional representations of ~4,442 lexical concepts between humans (the Glasgow Norms, N = 829; and the Lancaster Norms, N = 3,500) and state-of-the-art LLMs with and without visual learning, across non-sensorimotor, sensory and motor domains. We found that (1) the similarity between model and human representations decreases from non-sensorimotor to sensory domains and is minimal in motor domains, indicating a systematic divergence, and (2) models with visual learning exhibit enhanced similarity with human representations in visual-related dimensions. These results highlight the potential limitations of language in isolation for LLMs and that the integration of diverse modalities can potentially enhance alignment with human conceptual representation.
语言在多大程度上能够产生复杂的概念表征?多感官体验是否至关重要?最近的大型语言模型(LLMs)对概念形成中基础的必要性提出了挑战:即没有基础的大型语言模型是否仍能展现出类人表征。在这里,我们比较了人类(格拉斯哥规范,N = 829;以及兰卡斯特规范,N = 3500)与有无视觉学习能力的先进大型语言模型在非感觉运动、感觉和运动领域中约4442个词汇概念的多维表征。我们发现:(1)模型与人类表征之间的相似性从非感觉运动领域到感觉领域逐渐降低,在运动领域中最小,这表明存在系统性差异;(2)具有视觉学习能力的模型在与视觉相关的维度上与人类表征表现出更高的相似性。这些结果凸显了孤立语言对于大型语言模型的潜在局限性,以及整合多种模态可能会增强与人类概念表征的一致性。