重新评估大语言模型中的心理理论评估。

Re-evaluating Theory of Mind evaluation in large language models.

作者信息

Hu Jennifer, Sosa Felix, Ullman Tomer

机构信息

Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA.

Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA.

出版信息

Philos Trans R Soc Lond B Biol Sci. 2025 Aug 14;380(1932):20230499. doi: 10.1098/rstb.2023.0499.

DOI:10.1098/rstb.2023.0499

PMID:40808448

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12351311/

Abstract

The question of whether large language models (LLMs) possess Theory of Mind (ToM)-often defined as the ability to reason about others' mental states-has sparked significant scientific and public interest. However, the evidence as to whether LLMs possess ToM is mixed, and the recent growth in evaluations has not resulted in a convergence. Here, we take inspiration from cognitive science to re-evaluate the state of ToM evaluation in LLMs. We argue that a major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviours, or the computations underlying those behaviours. We also highlight ways in which current evaluations may be deviating from 'pure' measurements of ToM abilities, which also contributes to the confusion. We conclude by discussing several directions for future research, including the relationship between ToM and pragmatic communication, which could advance our understanding of artificial systems as well as human cognition.This article is part of the theme issue 'At the heart of human communication: new views on the complex relationship between pragmatics and Theory of Mind'.

摘要

大语言模型（LLMs）是否具备心理理论（ToM）（通常定义为推断他人心理状态的能力）这一问题引发了科学界和公众的广泛关注。然而，关于大语言模型是否具备心理理论的证据喜忧参半，近期评估数量的增加并未带来一致的结论。在此，我们从认知科学中汲取灵感，重新评估大语言模型中心理理论评估的现状。我们认为，对于大语言模型是否具备心理理论存在分歧的一个主要原因是，对于模型是应该被期望匹配人类行为，还是匹配这些行为背后的计算过程，缺乏明确的界定。我们还强调了当前评估可能偏离心理理论能力“纯粹”测量的方式，这也加剧了混乱。我们通过讨论未来研究的几个方向来得出结论，包括心理理论与语用交流之间的关系，这可能会增进我们对人工系统以及人类认知的理解。本文是主题为“人类交流的核心：语用学与心理理论复杂关系的新观点”系列文章的一部分。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

重新评估大语言模型中的心理理论评估。

Re-evaluating Theory of Mind evaluation in large language models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

相似文献

引用本文的文献

重新评估大语言模型中的心理理论评估。

Re-evaluating Theory of Mind evaluation in large language models.

作者信息

机构信息

出版信息