Dasgupta Sakyasingha, Wörgötter Florentin, Manoonpong Poramate
Institute for Physics - Biophysics, George-August-University Göttingen, Germany ; Bernstein Center for Computational Neuroscience, George-August-University Göttingen, Germany.
Bernstein Center for Computational Neuroscience, George-August-University Göttingen, Germany ; Center for Biorobotics, Maersk Mc-Kinney Møller Institute, University of Southern Denmark Odense, Denmark.
Front Neural Circuits. 2014 Oct 28;8:126. doi: 10.3389/fncir.2014.00126. eCollection 2014.
Goal-directed decision making in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). A number of computational and experimental studies have well established the role of the basal ganglia in reward-based learning, where as the cerebellum plays an important role in developing specific conditioned responses. Although viewed as distinct learning systems, recent animal experiments point toward their complementary role in behavioral learning, and also show the existence of substantial two-way communication between these two brain structures. Based on this notion of co-operative learning, in this paper we hypothesize that the basal ganglia and cerebellar learning systems work in parallel and interact with each other. We envision that such an interaction is influenced by reward modulated heterosynaptic plasticity (RMHP) rule at the thalamus, guiding the overall goal directed behavior. Using a recurrent neural network actor-critic model of the basal ganglia and a feed-forward correlation-based learning model of the cerebellum, we demonstrate that the RMHP rule can effectively balance the outcomes of the two learning systems. This is tested using simulated environments of increasing complexity with a four-wheeled robot in a foraging task in both static and dynamic configurations. Although modeled with a simplified level of biological abstraction, we clearly demonstrate that such a RMHP induced combinatorial learning mechanism, leads to stabler and faster learning of goal-directed behaviors, in comparison to the individual systems. Thus, in this paper we provide a computational model for adaptive combination of the basal ganglia and cerebellum learning systems by way of neuromodulated plasticity for goal-directed decision making in biological and bio-mimetic organisms.
生物系统中目标导向的决策广泛基于条件刺激和无条件刺激之间的关联。这可以进一步分为经典条件作用(基于相关性的学习)和操作性条件作用(基于奖励的学习)。大量的计算和实验研究已经充分证实了基底神经节在基于奖励的学习中的作用,而小脑在形成特定的条件反应中起着重要作用。尽管它们被视为不同的学习系统,但最近的动物实验表明它们在行为学习中具有互补作用,并且还显示出这两个脑结构之间存在大量的双向通信。基于这种合作学习的概念,在本文中我们假设基底神经节和小脑学习系统并行工作并相互作用。我们设想这种相互作用受丘脑处奖励调制的异突触可塑性(RMHP)规则的影响,从而指导整体目标导向行为。使用基底神经节的循环神经网络行为者 - 评论家模型和小脑的前馈相关性学习模型,我们证明RMHP规则可以有效地平衡两个学习系统的结果。这在四轮机器人觅食任务的静态和动态配置中使用复杂度不断增加的模拟环境进行了测试。尽管是在简化的生物抽象层面上建模,但我们清楚地表明,与单个系统相比,这种由RMHP诱导的组合学习机制能够更稳定、更快地学习目标导向行为。因此,在本文中,我们提供了一个计算模型,用于通过神经调节可塑性对基底神经节和小脑学习系统进行自适应组合,以用于生物和仿生生物体中的目标导向决策。