Department of Cognitive, Linguistics, and Psychological Sciences, Brown University.
Department of Psychology, Harvard University.
J Exp Psychol Gen. 2019 Mar;148(3):520-549. doi: 10.1037/xge0000569.
Carrots and sticks motivate behavior, and people can teach new behaviors to other organisms, such as children or nonhuman animals, by tapping into their reward learning mechanisms. But how people teach with reward and punishment depends on their expectations about the learner. We examine how people teach using reward and punishment by contrasting two hypotheses. The first is evaluative feedback as reinforcement, where rewards and punishments are used to shape learner behavior through reinforcement learning mechanisms. The second is evaluative feedback as communication, where rewards and punishments are used to signal target behavior to a learning agent reasoning about a teacher's pedagogical goals. We present formalizations of learning from these 2 teaching strategies based on computational frameworks for reinforcement learning. Our analysis based on these models motivates a simple interactive teaching paradigm that distinguishes between the two teaching hypotheses. Across 3 sets of experiments, we find that people are strongly biased to use evaluative feedback communicatively rather than as reinforcement. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
胡萝卜加大棒能激励行为,人们可以通过利用奖励学习机制来教导其他生物,如儿童或非人类动物新的行为。但是,人们如何通过奖惩来教导取决于他们对学习者的期望。我们通过对比两种假设来考察人们如何通过奖励和惩罚来教学。第一种是作为强化的评价性反馈,其中奖励和惩罚被用来通过强化学习机制来塑造学习者的行为。第二种是作为沟通的评价性反馈,其中奖励和惩罚被用来向一个对教师的教学目标进行推理的学习代理发出目标行为的信号。我们根据强化学习的计算框架,对这两种教学策略的学习进行了形式化。我们的分析基于这些模型,提出了一种简单的互动教学范式,将这两种教学假说区分开来。在 3 组实验中,我们发现人们强烈倾向于将评价性反馈用于沟通,而不是强化。(APA,所有权利保留)。