Suppr超能文献

通过包括直接和间接途径来增强强化学习模型可以提高纹状体依赖任务的性能。

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks.

机构信息

Department of Bioengineering, Volgenau School of Engineering, George Mason University, Fairfax, Virginia, United States of America.

Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.

出版信息

PLoS Comput Biol. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385. eCollection 2023 Aug.

Abstract

A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.

摘要

理解学习行为的一个主要进展源自于实验,这些实验表明奖励学习需要纹状体神经元中的多巴胺输入,并且源于皮质纹状体突触的突触可塑性。许多强化学习模型通过使用类似于多巴胺神经元放电的奖励预测误差来模拟这种多巴胺依赖性突触可塑性,从而学习针对一组线索做出最佳反应的最佳动作。尽管这些模型可以解释行为的许多方面,但复制某些类型的目标导向行为,例如更新和反转,需要额外的模型组件。在这里,我们提出了一种强化学习模型 TD2Q,它具有两个 Q 矩阵,一个代表直接通路神经元 (G),另一个代表间接通路神经元 (N),与基底神经节更好地对应。与以前的两个-Q 架构不同,TD2Q 的一个新颖而关键的方面是利用时间差分奖励预测误差来更新 G 和 N 矩阵。使用基于奖励的自适应探索参数的 softmax 为 N 和 G 选择最佳动作,然后使用应用于两个动作概率的第二个选择步骤来解决差异。该模型在一系列多步任务中进行了测试,包括消退、更新、辨别;切换奖励概率学习;以及序列学习。模拟结果表明,TD2Q 在选择和序列学习任务中产生类似于啮齿动物的行为,并且需要使用时间差分奖励预测误差来学习多步任务。如实验中观察到的,阻止 N 矩阵的更新规则会阻止辨别学习。使用两个矩阵可大大提高序列学习任务的性能。这些结果表明,包括基底神经节生理学的其他方面可以提高强化学习模型的性能,更好地复制动物行为,并深入了解直接和间接纹状体神经元的作用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验