Schoenegger Philipp, Greenberg Spencer, Grishin Alexander, Lewis Joshua, Caviola Lucius
London School of Economics and Political Science, London, UK.
Spark Wave, New York, NY, USA.
Commun Psychol. 2025 Feb 12;3(1):23. doi: 10.1038/s44271-025-00205-w.
We assess the abilities of both specialized deep neural networks, such as PersonalityMap, and general LLMs, including GPT-4o and Claude 3 Opus, in understanding human personality by predicting correlations between personality questionnaire items. All AI models outperform the vast majority of laypeople and academic experts. However, we can improve the accuracy of individual correlation predictions by taking the median prediction per group to produce a "wisdom of the crowds" estimate. Thus, we also compare the median predictions from laypeople, academic experts, GPT-4o/Claude 3 Opus, and PersonalityMap. Based on medians, PersonalityMap and academic experts surpass both LLMs and laypeople on most measures. These results suggest that while advanced LLMs make superior predictions compared to most individual humans, specialized models like PersonalityMap can match even expert group-level performance in domain-specific tasks. This underscores the capabilities of large language models while emphasizing the continued relevance of specialized systems as well as human experts for personality research.
我们通过预测人格问卷项目之间的相关性,评估了诸如PersonalityMap等专业深度神经网络以及包括GPT-4o和Claude 3 Opus在内的通用语言模型在理解人类人格方面的能力。所有人工智能模型的表现都优于绝大多数外行人及学术专家。然而,我们可以通过取每组预测的中位数来生成“群体智慧”估计值,从而提高个体相关性预测的准确性。因此,我们还比较了外行人、学术专家、GPT-4o/Claude 3 Opus和PersonalityMap的中位数预测。基于中位数,在大多数指标上,PersonalityMap和学术专家超过了语言模型和外行人。这些结果表明,虽然先进的语言模型比大多数个体人类做出了更优的预测,但像PersonalityMap这样的专业模型在特定领域任务中甚至可以达到专家群体水平的表现。这凸显了大语言模型的能力,同时强调了专业系统以及人类专家在人格研究中持续的相关性。