Köster Raphael, Duéñez-Guzmán Edgar A, Cunningham William A, Leibo Joel Z
Google DeepMind, London EC4A 3TW, United Kingdom.
Department of Psychology, University of Toronto, Toronto, ON M5S 3G3, Canada.
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319947121. doi: 10.1073/pnas.2319947121. Epub 2025 Jun 16.
Theories on group-bias often posit an internal preparedness to bias one's cognition to favor the in-group (often envisioned as a product of evolution). In contrast, other theories suggest that group-biases can emerge from nonspecialized cognitive processes. These perspectives have historically been difficult to disambiguate given that observed behavior can often be attributed to innate processes, even when groups are experimentally assigned. Here, we use modern techniques from the field of AI that allow us to ask what group biases can be expected from a learning agent that is a pure blank slate without any intrinsic social biases, and whose lifetime of experiences can be tightly controlled. This is possible because deep reinforcement-learning agents learn to convert raw sensory input (i.e. pixels) to reward-driven action, a unique feature among cognitive models. We find that blank slate agents do develop group biases based on arbitrary group differences (i.e. color). We show that the bias develops as a result of familiarity of experience and depends on the visual patterns becoming associated with reward through interaction. The bias artificial agents display is not a static reflection of the bias in their stream of experiences. In this minimal environment, the bias can be overcome given enough positive experiences, although unlearning the bias takes longer than acquiring it. Further, we show how this style of tabula rasa group behavior model can be used to test fine-grained predictions of psychological theories.
关于群体偏见的理论通常假定存在一种内在倾向,会使一个人的认知偏向于支持内群体(通常被视为进化的产物)。相比之下,其他理论则认为群体偏见可能源于非专门化的认知过程。鉴于观察到的行为往往可以归因于先天过程,即使群体是通过实验分配的,这些观点在历史上一直难以区分。在这里,我们使用人工智能领域的现代技术,使我们能够探究对于一个完全没有任何内在社会偏见且其一生经历能够得到严格控制的学习主体,可能会出现什么样的群体偏见。这是可行的,因为深度强化学习主体学会将原始感官输入(即像素)转化为受奖励驱动的行动,这是认知模型中的一个独特特征。我们发现,白板主体确实会基于任意的群体差异(如颜色)形成群体偏见。我们表明,这种偏见是由于对经验的熟悉而产生的,并且取决于视觉模式通过互动与奖励建立联系。人工主体所表现出的偏见并非其经验流中偏见的静态反映。在这种极简环境中,只要有足够多的积极经验,这种偏见是可以被克服的,尽管消除这种偏见比形成它所需的时间更长。此外,我们展示了这种白板群体行为模型如何能够用于测试心理学理论的细粒度预测。