Shultz Thomas R, Wise Jamie M, Nobandegani Ardavan S
Department of Psychology, McGill University, Montreal, Canada.
School of Computer Science, McGill University, Montreal, Canada.
R Soc Open Sci. 2025 Feb 20;12(2):241313. doi: 10.1098/rsos.241313. eCollection 2025 Feb.
We examine whether a leading AI system, GPT-4, understands text as well as humans do, first using a well-established standardized test of discourse comprehension. On this test, GPT-4 performs slightly, but not statistically significantly, better than humans given the very high level of human performance. Both GPT-4 and humans make correct inferences about information that is not explicitly stated in the text, a critical test of understanding. Next, we use more difficult passages to determine whether that could allow larger differences between GPT-4 and humans. GPT-4 does considerably better on this more difficult text than do the high school and university students for whom these the text passages are designed, as admission tests of student reading comprehension. Deeper exploration of GPT-4's performance on material from one of these admission tests reveals generally accepted signatures of genuine understanding, namely generalization and inference.
我们首先使用一种成熟的话语理解标准化测试,来检验领先的人工智能系统GPT-4是否能像人类一样理解文本。在这项测试中,鉴于人类的表现水平非常高,GPT-4的表现略好于人类,但在统计学上并无显著差异。GPT-4和人类都能对文本中未明确表述的信息做出正确推断,这是理解的关键测试。接下来,我们使用更难的段落来确定这是否会使GPT-4和人类之间产生更大差异。作为学生阅读理解入学测试的这些文本段落,是为高中生和大学生设计的,而GPT-4在这些更难的文本上的表现比他们要好得多。对GPT-4在其中一项入学测试材料上的表现进行更深入的探究,揭示了真正理解的普遍认可的特征,即泛化和推理。