Suppr超能文献

大语言模型的人格测试:时间稳定性有限,但亲社会性突出。

Personality testing of large language models: limited temporal stability, but highlighted prosociality.

作者信息

Bodroža Bojana, Dinić Bojana M, Bojić Ljubiša

机构信息

Department of Psychology, Faculty of Philosophy, University of Novi Sad, Novi Sad, Serbia.

Institute for Artificial Intelligence Research and Development of Serbia, Novi Sad, Serbia.

出版信息

R Soc Open Sci. 2024 Oct 9;11(10):240180. doi: 10.1098/rsos.240180. eCollection 2024 Oct.

Abstract

As large language models (LLMs) continue to gain popularity due to their human-like traits and the intimacy they offer to users, their societal impact inevitably expands. This leads to the rising necessity for comprehensive studies to fully understand LLMs and reveal their potential opportunities, drawbacks and overall societal impact. With that in mind, this research conducted an extensive investigation into seven LLMs, aiming to assess the temporal stability and inter-rater agreement on their responses on personality instruments in two time points. In addition, LLMs' personality profile was analysed and compared with human normative data. The findings revealed varying levels of inter-rater agreement in the LLMs' responses over a short time, with some LLMs showing higher agreement (e.g. Llama3 and GPT-4o) compared with others (e.g. GPT-4 and Gemini). Furthermore, agreement depended on used instruments as well as on domain or trait. This implies the variable robustness in LLMs' ability to reliably simulate stable personality characteristics. In the case of scales which showed at least fair agreement, LLMs displayed mostly a socially desirable profile in both agentic and communal domains, as well as a prosocial personality profile reflected in higher agreeableness and conscientiousness and lower Machiavellianism. Exhibiting temporal stability and coherent responses on personality traits is crucial for AI systems due to their societal impact and AI safety concerns.

摘要

随着大语言模型(LLMs)因其类人特征以及为用户带来的亲近感而持续受到欢迎,它们对社会的影响不可避免地扩大。这使得全面研究的必要性不断增加,以充分理解大语言模型并揭示其潜在机遇、缺点及整体社会影响。考虑到这一点,本研究对七个大语言模型进行了广泛调查,旨在评估它们在两个时间点对人格量表的回答的时间稳定性和评分者间一致性。此外,还分析了大语言模型的人格特征,并与人类标准数据进行了比较。研究结果显示,在短时间内,大语言模型的回答在评分者间一致性上存在不同水平,与其他模型(如GPT-4和Gemini)相比,一些模型(如Llama3和GPT-4o)显示出更高的一致性。此外,一致性取决于所使用的量表以及领域或特质。这意味着大语言模型可靠模拟稳定人格特征的能力存在可变的稳健性。在显示出至少合理一致性的量表方面,大语言模型在能动性和社群性领域大多呈现出社会期望的特征,以及在更高的宜人性和尽责性以及更低的马基雅维利主义方面所反映出的亲社会人格特征。由于人工智能系统对社会的影响以及人工智能安全问题,在人格特质上表现出时间稳定性和连贯的回答至关重要。

相似文献

本文引用的文献

2
Using cognitive psychology to understand GPT-3.利用认知心理学理解 GPT-3。
Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.
3
Psychometric Properties of the HEXACO-100.HEXACO-100 的心理测量学特性。
Assessment. 2018 Jul;25(5):543-556. doi: 10.1177/1073191116659134. Epub 2016 Jul 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验