Wang Yilei, Zhao Jiabao, Ones Deniz S, He Liang, Xu Xin
Shanghai Institute of AI for Education, East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China.
School of Computer Science and Technology, East China Normal University, Shanghai, China.
Sci Rep. 2025 Jan 2;15(1):519. doi: 10.1038/s41598-024-84109-5.
For social sciences, recent advancements in Large Language Models (LLMs) have the potential to revolutionize the study of human behaviors by facilitating the creation of realistic agents characterized by a diverse range of individual differences. This research presents novel simulation studies assessing GPT-4's ability to role-play real-world individuals with diverse big five personality profiles. In simulation 1, emulated personality responses exhibited superior internal consistency, but also a more distinct and structured factor organization compared to the human counterparts they were based on. Furthermore, these emulated scores exhibited remarkably high convergent validity with the human self-reported personality scale scores. Simulation 2 replicated these findings but demonstrated that the robustness of GPT-4's role-playing appears to wane as the complexity of the roles increases. Introducing supplementary demographic information in conjunction with personality affected convergent validities for certain emulated traits. However, including additional demographic characteristics enhanced the validity of emulated personality scores for predicting external criteria. Collectively, the findings underscore a promising future of using LLMs to emulate realistic and real person-based agents with varied personality traits. The broader applied implications and avenues for future research are elaborated upon.
对于社会科学而言,大语言模型(LLMs)的最新进展有可能通过促进创建具有各种个体差异的逼真智能体来彻底改变对人类行为的研究。本研究提出了新颖的模拟研究,评估GPT-4扮演具有不同大五人格特征的现实世界个体的能力。在模拟1中,模拟的人格反应表现出更高的内部一致性,而且与它们所基于的人类对应物相比,具有更明显和结构化的因素组织。此外,这些模拟分数与人类自我报告的人格量表分数表现出非常高的收敛效度。模拟2重复了这些发现,但表明随着角色复杂性的增加,GPT-4角色扮演的稳健性似乎会减弱。结合人格引入补充人口统计信息会影响某些模拟特征的收敛效度。然而,纳入额外的人口统计特征提高了模拟人格分数预测外部标准的效度。总体而言,这些发现凸显了使用大语言模型来模拟具有不同人格特质的逼真且基于真实人物的智能体的光明前景。文中还阐述了更广泛的应用意义和未来研究的途径。