Ho Mark K, MacGlashan James, Littman Michael L, Cushman Fiery
Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Box 1821, Providence, RI 02912, United States.
Department of Computer Science, Brown University, 115 Waterman St, Providence, RI 02906, United States.
Cognition. 2017 Oct;167:91-106. doi: 10.1016/j.cognition.2017.03.006. Epub 2017 Mar 22.
Humans often attempt to influence one another's behavior using rewards and punishments. How does this work? Psychologists have often assumed that "evaluative feedback" influences behavior via standard learning mechanisms that learn from environmental contingencies. On this view, teaching with evaluative feedback involves leveraging learning systems designed to maximize an organism's positive outcomes. Yet, despite its parsimony, programs of research predicated on this assumption, such as ones in developmental psychology, animal behavior, and human-robot interaction, have had limited success. We offer an explanation by analyzing the logic of evaluative feedback and show that specialized learning mechanisms are uniquely favored in the case of evaluative feedback from a social partner. Specifically, evaluative feedback works best when it is treated as communicating information about the value of an action rather than as a form of reward to be maximized. This account suggests that human learning from evaluative feedback depends on inferences about communicative intent, goals and other mental states-much like learning from other sources, such as demonstration, observation and instruction. Because these abilities are especially developed in humans, the present account also explains why evaluative feedback is far more widespread in humans than non-human animals.
人类常常试图通过奖励和惩罚来影响彼此的行为。这是如何起作用的呢?心理学家们常常假定“评价性反馈”是通过从环境偶然性中学习的标准学习机制来影响行为的。按照这种观点,利用评价性反馈进行教学涉及利用旨在使生物体的积极结果最大化的学习系统。然而,尽管这种观点简洁明了,但基于这一假设的研究项目,比如发展心理学、动物行为学以及人机交互方面的研究,取得的成功却很有限。我们通过分析评价性反馈的逻辑给出了一种解释,并表明在来自社会伙伴的评价性反馈的情况下,专门的学习机制具有独特的优势。具体而言,当评价性反馈被视为传达有关一种行为的价值的信息,而不是被视为一种要最大化的奖励形式时,它的效果最佳。这种解释表明,人类从评价性反馈中学习依赖于对交际意图、目标和其他心理状态的推断——这与从其他来源(如示范、观察和指导)学习非常相似。由于这些能力在人类中尤其发达,所以目前的解释也说明了为什么评价性反馈在人类中比在非人类动物中更为普遍。