Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic.
Department of Slavic and Hungarian Studies, Faculty of Language, Literature and Humanities, Humboldt University of Berlin, Berlin, Germany.
PLoS One. 2024 Mar 13;19(3):e0298522. doi: 10.1371/journal.pone.0298522. eCollection 2024.
This study explores the capabilities of large language models to replicate the behavior of individuals with underdeveloped cognitive and language skills. Specifically, we investigate whether these models can simulate child-like language and cognitive development while solving false-belief tasks, namely, change-of-location and unexpected-content tasks. GPT-3.5-turbo and GPT-4 models by OpenAI were prompted to simulate children (N = 1296) aged one to six years. This simulation was instantiated through three types of prompts: plain zero-shot, chain-of-thoughts, and primed-by-corpus. We evaluated the correctness of responses to assess the models' capacity to mimic the cognitive skills of the simulated children. Both models displayed a pattern of increasing correctness in their responses and rising language complexity. That is in correspondence with a gradual enhancement in linguistic and cognitive abilities during child development, which is described in the vast body of research literature on child development. GPT-4 generally exhibited a closer alignment with the developmental curve observed in 'real' children. However, it displayed hyper-accuracy under certain conditions, notably in the primed-by-corpus prompt type. Task type, prompt type, and the choice of language model influenced developmental patterns, while temperature and the gender of the simulated parent and child did not consistently impact results. We conducted analyses of linguistic complexity, examining utterance length and Kolmogorov complexity. These analyses revealed a gradual increase in linguistic complexity corresponding to the age of the simulated children, regardless of other variables. These findings show that the language models are capable of downplaying their abilities to achieve a faithful simulation of prompted personas.
本研究旨在探索大型语言模型复制认知和语言技能发育不全的个体行为的能力。具体而言,我们调查这些模型是否能够在解决错误信念任务(即位置变化和内容意外任务)时模拟儿童般的语言和认知发展。我们使用 OpenAI 的 GPT-3.5-turbo 和 GPT-4 模型来模拟 1 至 6 岁的儿童(N=1296)。通过三种提示方式来实现这种模拟:普通的零样本提示、思维链提示和语料库启发提示。我们评估了对响应的正确性,以评估模型模拟模拟儿童认知技能的能力。两个模型在响应的正确性和语言复杂性方面都表现出逐渐提高的趋势。这与儿童发展研究文献中描述的语言和认知能力的逐渐增强相吻合。GPT-4 通常与“真实”儿童观察到的发展曲线更为一致。然而,它在某些条件下表现出超准确性,特别是在语料库启发提示类型下。任务类型、提示类型以及语言模型的选择都影响了发展模式,而温度和模拟的父母和孩子的性别则没有一致地影响结果。我们进行了语言复杂性的分析,检查了语句长度和柯尔莫哥洛夫复杂性。这些分析表明,无论其他变量如何,语言复杂性都随着模拟儿童年龄的增加而逐渐增加。这些发现表明,语言模型能够淡化其能力,以实现对提示角色的忠实模拟。