Zentall Thomas R
Department of Psychology, University of Kentucky, Lexington, KY, 40506-0044, USA.
Learn Behav. 2020 Mar;48(1):165-172. doi: 10.3758/s13420-019-00407-3.
Delay of reinforcement is generally thought to be inversely correlated with speed of acquisition. However, in the case of simultaneous discrimination learning, in which choice results in immediate reinforcement, delay of reinforcement can improve acquisition. For example, in the ephemeral reward task, animals are given a choice between two alternatives, A and B. Choice of A provides reinforcement, and the trial is over. Choice of B provides reinforcement and access to alternative A (thus, two reinforcements). Many animals appear unable to learn to choose B consistently, but inserting a 20-s delay between choice and outcome has been shown to facilitate optimal choice. Similarly, pigeons given a choice between a signal for one pellet and a signal for two pellets (each occurring without a delay) have difficulty learning to choose the two-pellet alternative, unless the reinforcement is delayed. In a version of object permanence, food is placed in one of two containers, and the pigeon must choose the container with the food. Pigeons have difficulty reliably choosing the correct container unless a brief delay is inserted between baiting and choice. Finally, pigeons have been shown to prefer a suboptimal alternative (a 20% chance of getting a cue for reinforcement) over an optimal alternative (a 100% chance of getting a cue for 50% reinforcement). However, if pigeons are forced to wait 20 s following their choice to receive the cues, no preference for the suboptimal alternative is found. Thus, impulsive choice may be reduced by delaying the consequence of that choice.
强化延迟通常被认为与习得速度呈负相关。然而,在同时辨别学习的情况下,即选择会立即得到强化,强化延迟却可以改善习得情况。例如,在短暂奖励任务中,动物要在两个选项A和B之间做出选择。选择A会得到强化,试验结束。选择B会得到强化并可选择选项A(因此,有两次强化)。许多动物似乎无法学会始终选择B,但在选择和结果之间插入20秒的延迟已被证明有助于做出最优选择。同样,让鸽子在一个食丸信号和两个食丸信号之间做出选择(每个信号出现时都无延迟),它们很难学会选择两个食丸的选项,除非强化延迟。在一个客体永久性的版本中,食物被放在两个容器中的一个里,鸽子必须选择装有食物的容器。除非在放置诱饵和选择之间插入短暂延迟,否则鸽子很难可靠地选择正确的容器。最后,已证明鸽子更喜欢次优选项(有20%的机会得到强化提示)而不是最优选项(有100%的机会得到50%强化的提示)。然而,如果鸽子在做出选择后被迫等待20秒才能收到提示,就不会发现它们对次优选项有偏好。因此,通过延迟选择的结果可能会减少冲动选择。