Suppr超能文献

不确定性下的决策:基于部分可观察马尔可夫决策过程的神经模型。

Decision making under uncertainty: a neural model based on partially observable markov decision processes.

机构信息

Department of Computer Science and Engineering and Neurobiology and Behavior Program, University of Washington Seattle, WA, USA.

出版信息

Front Comput Neurosci. 2010 Nov 24;4:146. doi: 10.3389/fncom.2010.00146. eCollection 2010.

Abstract

A fundamental problem faced by animals is learning to select actions based on noisy sensory information and incomplete knowledge of the world. It has been suggested that the brain engages in Bayesian inference during perception but how such probabilistic representations are used to select actions has remained unclear. Here we propose a neural model of action selection and decision making based on the theory of partially observable Markov decision processes (POMDPs). Actions are selected based not on a single "optimal" estimate of state but on the posterior distribution over states (the "belief" state). We show how such a model provides a unified framework for explaining experimental results in decision making that involve both information gathering and overt actions. The model utilizes temporal difference (TD) learning for maximizing expected reward. The resulting neural architecture posits an active role for the neocortex in belief computation while ascribing a role to the basal ganglia in belief representation, value computation, and action selection. When applied to the random dots motion discrimination task, model neurons representing belief exhibit responses similar to those of LIP neurons in primate neocortex. The appropriate threshold for switching from information gathering to overt actions emerges naturally during reward maximization. Additionally, the time course of reward prediction error in the model shares similarities with dopaminergic responses in the basal ganglia during the random dots task. For tasks with a deadline, the model learns a decision making strategy that changes with elapsed time, predicting a collapsing decision threshold consistent with some experimental studies. The model provides a new framework for understanding neural decision making and suggests an important role for interactions between the neocortex and the basal ganglia in learning the mapping between probabilistic sensory representations and actions that maximize rewards.

摘要

动物面临的一个基本问题是学会根据嘈杂的感觉信息和对世界的不完全了解来选择行动。有人认为,大脑在感知过程中进行贝叶斯推理,但如何利用这种概率表示来选择行动仍然不清楚。在这里,我们提出了一个基于部分可观察马尔可夫决策过程(POMDP)理论的行动选择和决策的神经模型。行动不是基于状态的单一“最佳”估计,而是基于状态的后验分布(“信念”状态)来选择。我们展示了这样一个模型如何为解释涉及信息收集和显性行动的决策中的实验结果提供一个统一的框架。该模型利用时间差分(TD)学习来最大化预期奖励。由此产生的神经结构假设新皮层在信念计算中起着积极的作用,而基底神经节在信念表示、价值计算和行动选择中起着作用。当应用于随机点运动辨别任务时,代表信念的模型神经元表现出与灵长类动物新皮层 LIP 神经元相似的反应。在奖励最大化过程中,从信息收集到显性行动的适当切换阈值自然出现。此外,模型中的奖励预测误差的时间过程与随机点任务中基底神经节中的多巴胺反应相似。对于具有截止时间的任务,该模型学习了一种随时间变化的决策策略,预测与一些实验研究一致的决策阈值崩溃。该模型为理解神经决策提供了一个新的框架,并表明新皮层和基底神经节之间的相互作用在学习最大化奖励的概率感觉表示和行动之间的映射方面起着重要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d362/2998859/93836333fa22/fncom-04-00146-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验