Schleyer Michael, Fendt Markus, Schuller Sarah, Gerber Bertram
Department Genetics of Learning and Memory, Leibniz Institute for Neurobiology, Magdeburg, Germany.
Institute for Pharmacology and Toxicology, Otto von Guericke University Magdeburg, Magdeburg, Germany.
Front Psychol. 2018 Aug 24;9:1494. doi: 10.3389/fpsyg.2018.01494. eCollection 2018.
Finding rewards and avoiding punishments are powerful goals of behavior. To maximize reward and minimize punishment, it is beneficial to learn about the stimuli that predict their occurrence, and decades of research have provided insight into the brain processes underlying such associative reinforcement learning. In addition, it is well known in experimental psychology, yet often unacknowledged in neighboring scientific disciplines, that subjects also learn about the stimuli that predict the absence of reinforcement. Here we evaluate evidence for both these learning processes. We focus on two study cases that both provide a baseline level of behavior against which the effects of associative learning can be assessed. Firstly, we report pertinent evidence from larvae. A re-analysis of the literature reveals that through paired presentations of an odor A and a sugar reward (A+) the animals learn that the reward can be found where the odor is, and therefore show an above-baseline preference for the odor. In contrast, through unpaired training (A/+) the animals learn that the reward can be found precisely where the odor is not, and accordingly these larvae show a below-baseline preference for it (the same is the case, with inverted signs, for learning through taste punishment). In addition, we present previously unpublished data demonstrating that also during a two-odor, differential conditioning protocol (A+/B) both these learning processes take place in larvae, i.e., learning about both the rewarded stimulus A and the non-rewarded stimulus B (again, this is likewise the case for differential conditioning with taste punishment). Secondly, after briefly discussing published evidence from adult , honeybees, and rats, we report an unpublished data set showing that relative to baseline behavior after truly random presentations of a visual stimulus A and punishment, rats exhibit memories of opposite valence upon paired and unpaired training. Collectively, the evidence conforms to classical findings in experimental psychology and suggests that across species animals associatively learn both through paired and through unpaired presentations of stimuli with reinforcement - with opposite valence. While the brain mechanisms of unpaired learning for the most part still need to be uncovered, the immediate implication is that using unpaired procedures as a mnemonically neutral control for associative reinforcement learning may be leading analyses astray.
寻求奖励和避免惩罚是行为的强大目标。为了使奖励最大化并使惩罚最小化,了解预测奖励和惩罚发生的刺激是有益的,并且数十年的研究已经深入了解了这种联想强化学习背后的大脑过程。此外,在实验心理学中这是众所周知的,但在相邻的科学学科中却常常未被认可,即受试者也会了解预测无强化的刺激。在这里,我们评估这两种学习过程的证据。我们关注两个研究案例,它们都提供了一个行为基线水平,据此可以评估联想学习的效果。首先,我们报告来自幼虫的相关证据。对文献的重新分析表明,通过气味A和糖奖励(A +)的配对呈现,动物了解到在有气味的地方可以找到奖励,因此对该气味表现出高于基线的偏好。相反,通过非配对训练(A / +),动物了解到在没有气味的地方才能找到奖励,因此这些幼虫对其表现出低于基线的偏好(对于通过味觉惩罚进行的学习,情况相同,只是符号相反)。此外,我们展示了之前未发表的数据,表明在双气味差异条件反射实验(A + / B)中,这两种学习过程也会在幼虫中发生,即对奖励刺激A和非奖励刺激B的学习(同样,对于味觉惩罚的差异条件反射也是如此)。其次,在简要讨论了来自成年动物、蜜蜂和大鼠的已发表证据后,我们报告了一个未发表的数据集,表明相对于视觉刺激A和惩罚的真正随机呈现后的基线行为,大鼠在配对和非配对训练后表现出相反效价的记忆。总体而言,这些证据符合实验心理学的经典发现,并表明跨物种动物通过刺激与强化的配对和非配对呈现进行联想学习——效价相反。虽然非配对学习的大脑机制在很大程度上仍有待揭示,但直接的影响是,将非配对程序用作联想强化学习的记忆中性对照可能会导致分析误入歧途。