Suppr超能文献

人工智能与理解的错觉:对心理理论和大语言模型的系统综述

Artificial Intelligence and the Illusion of Understanding: A Systematic Review of Theory of Mind and Large Language Models.

作者信息

Marchetti Antonella, Manzi Federico, Riva Giuseppe, Gaggioli Andrea, Massaro Davide

机构信息

Department of Psychology, Research Center on Theory of Mind and Social Competences in the Lifespan, Università Cattolica del Sacro Cuore, Milan, Italy.

Humane Technology Laboratory, Università Cattolica del Sacro Cuore, Milan, Italy.

出版信息

Cyberpsychol Behav Soc Netw. 2025 Jul;28(7):505-514. doi: 10.1089/cyber.2024.0536. Epub 2025 May 7.

Abstract

The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the "illusion of understanding" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.

摘要

大语言模型(LLMs)的发展引发了关于其心理理论(ToM)能力的重大争论,心理理论即赋予自己和他人心理状态的能力。本系统综述通过评估大语言模型在心理理论任务上的表现,并将其与人类的反应进行比较,来考察大语言模型展现人工心理理论(AToM)的程度。虽然大语言模型,尤其是GPT-4,在一阶错误信念任务上表现良好,但它们在更复杂的推理方面存在困难,比如二阶信念和递归推理,在这些方面人类始终比它们表现得更好。此外,该综述强调了心理理论评估的变异性,因为许多研究对大语言模型采用了经典任务,这引发了对与人类心理理论可比性的担忧。大多数评估仍局限于基于文本的任务,忽略了对人类社会认知至关重要的具身和多模态维度。本综述讨论了大语言模型中“理解幻觉”的两个主要原因:第一,它们缺乏真正心理理论所需的发展和认知机制;第二,测试设计中的方法偏差有利于大语言模型的优势,限制了与人类表现的直接比较。研究结果凸显了进行更具生态效度的评估和跨学科研究的必要性,以便更好地界定人工心理理论的局限性和潜力。这一系列问题与心理学高度相关,因为语言通常被认为只是人类心理理论更广泛发展中的一个组成部分,这一观点与人工心理理论研究中的主导方法形成对比。这种差异引发了关于人类心理理论和人工心理理论在多大程度上具有可比性的关键问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验