Jensen Greg, Muñoz Fabian, Alkan Yelda, Ferrera Vincent P, Terrace Herbert S
Department of Neuroscience, Columbia University, New York, New York, United States of America; Department of Psychology, Columbia University, New York, New York, United States of America.
Department of Neuroscience, Columbia University, New York, New York, United States of America.
PLoS Comput Biol. 2015 Sep 25;11(9):e1004523. doi: 10.1371/journal.pcbi.1004523. eCollection 2015.
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
传递性推理(即已知B > C且C > D时推断出B > D的能力)是序列学习的一个普遍特征,在数十种物种中都有观察到。尽管有这些强大的行为效应,但依赖奖励预测误差或联想强度的强化学习模型通常无法进行这些推理。我们提出了一种受认知过程启发的算法,称为betasort,它能以较低的计算成本进行传递性推理。这是通过以下方式实现的:(1)使用贝塔分布在单位跨度上表示刺激位置;(2)对正反馈和负反馈进行不对称处理;(3)在每次试验中更新每个刺激的位置,无论该刺激是否可见。我们比较了恒河猴、人类、betasort算法以及Q学习(一种既定的奖励预测误差(RPE)模型)的表现。在这些当中,只有Q学习在关键测试试验中的反应未超过随机水平。Betasort的成功(与RPE模型相比)及其计算效率(与完整的马尔可夫决策过程实现相比)表明,通过特征驱动的方法比较形式模型将最有助于对生物体中的强化学习进行研究。