Department of Psychology, University of California, Los Angeles, CA, USA.
Department of Statistics, University of California, Los Angeles, CA, USA.
Nat Hum Behav. 2023 Sep;7(9):1526-1541. doi: 10.1038/s41562-023-01659-w. Epub 2023 Jul 31.
The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of Generative Pre-trained Transformer (GPT)-3) on a range of analogical tasks, including a non-visual matrix reasoning task based on the rule structure of Raven's Standard Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings; preliminary tests of GPT-4 indicated even better performance. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
最近大型语言模型的出现重新引发了一场争论,即是否可以在有足够训练数据的情况下,在这些通用模型中出现人类认知能力。特别感兴趣的是这些模型是否能够在没有任何直接训练的情况下,对新问题进行零样本推理。在人类认知中,这种能力与类比推理能力密切相关。在这里,我们在一系列类比任务上对人类推理者和大型语言模型(基于生成式预训练转换器 (GPT)-3 的文本-达芬奇-003 变体)进行了直接比较,包括基于 Raven 的标准渐进矩阵规则结构的非视觉矩阵推理任务。我们发现 GPT-3 在抽象模式归纳方面表现出了惊人的强大能力,在大多数情况下与人类能力相匹配,甚至超过了人类能力;对 GPT-4 的初步测试表明其性能甚至更好。我们的结果表明,像 GPT-3 这样的大型语言模型已经获得了一种新的能力,可以为广泛的类比问题找到零样本解决方案。