Department of Bioengineering, Imperial College London, London, UK.
School of Psychological Sciences, Monash University, Melbourne, Australia.
Sci Rep. 2024 Jul 10;14(1):15917. doi: 10.1038/s41598-024-65604-1.
Large Language Models (LLMs), such as the General Pre-trained Transformer (GPT), have shown remarkable performance in various cognitive tasks. However, it remains unclear whether these models have the ability to accurately infer human perceptual representations. Previous research has addressed this question by quantifying correlations between similarity response patterns of humans and LLMs. Correlation provides a measure of similarity, but it relies pre-defined item labels and does not distinguish category- and item- level similarity, falling short of characterizing detailed structural correspondence between humans and LLMs. To assess their structural equivalence in more detail, we propose the use of an unsupervised alignment method based on Gromov-Wasserstein optimal transport (GWOT). GWOT allows for the comparison of similarity structures without relying on pre-defined label correspondences and can reveal fine-grained structural similarities and differences that may not be detected by simple correlation analysis. Using a large dataset of similarity judgments of 93 colors, we compared the color similarity structures of humans (color-neurotypical and color-atypical participants) and two GPT models (GPT-3.5 and GPT-4). Our results show that the similarity structure of color-neurotypical participants can be remarkably well aligned with that of GPT-4 and, to a lesser extent, to that of GPT-3.5. These results contribute to the methodological advancements of comparing LLMs with human perception, and highlight the potential of unsupervised alignment methods to reveal detailed structural correspondences.
大型语言模型(LLMs),如通用预训练转换器(GPT),在各种认知任务中表现出了显著的性能。然而,目前还不清楚这些模型是否有能力准确推断人类的感知表示。为了解决这个问题,先前的研究通过量化人类和 LLM 的相似性响应模式之间的相关性来进行研究。相关性提供了一种相似性的度量方法,但它依赖于预先定义的项目标签,并且不能区分类别和项目级别的相似性,无法刻画人类和 LLM 之间的详细结构对应关系。为了更详细地评估它们的结构等价性,我们提出使用基于 Gromov-Wasserstein 最优传输(GWOT)的无监督对齐方法。GWOT 允许在不依赖于预先定义的标签对应关系的情况下比较相似性结构,并且可以揭示出简单的相关分析可能无法检测到的细微的结构相似性和差异。我们使用了一个包含 93 种颜色的相似性判断的大型数据集,比较了人类(颜色神经典型和颜色非典型参与者)和两个 GPT 模型(GPT-3.5 和 GPT-4)的颜色相似性结构。我们的结果表明,颜色神经典型参与者的相似性结构可以与 GPT-4 非常好地对齐,并且在一定程度上与 GPT-3.5 对齐。这些结果为比较 LLM 与人类感知的方法学发展做出了贡献,并强调了无监督对齐方法揭示详细结构对应关系的潜力。