Department of Neuroscience, Columbia University College of Physicians and Surgeons, New York, NY 10032-2695, USA.
Neuroimage. 2010 Sep;52(3):833-47. doi: 10.1016/j.neuroimage.2010.01.047. Epub 2010 Jan 25.
Complex tasks often require the memory of recent events, the knowledge about the context in which they occur, and the goals we intend to reach. All this information is stored in our mental states. Given a set of mental states, reinforcement learning (RL) algorithms predict the optimal policy that maximizes future reward. RL algorithms assign a value to each already-known state so that discovering the optimal policy reduces to selecting the action leading to the state with the highest value. But how does the brain create representations of these mental states in the first place? We propose a mechanism for the creation of mental states that contain information about the temporal statistics of the events in a particular context. We suggest that the mental states are represented by stable patterns of reverberating activity, which are attractors of the neural dynamics. These representations are built from neurons that are selective to specific combinations of external events (e.g. sensory stimuli) and pre-existent mental states. Consistent with this notion, we find that neurons in the amygdala and in orbitofrontal cortex (OFC) often exhibit this form of mixed selectivity. We propose that activating different mixed selectivity neurons in a fixed temporal order modifies synaptic connections so that conjunctions of events and mental states merge into a single pattern of reverberating activity. This process corresponds to the birth of a new, different mental state that encodes a different temporal context. The concretion process depends on temporal contiguity, i.e. on the probability that a combination of an event and mental states follows or precedes the events and states that define a certain context. The information contained in the context thereby allows an animal to assign unambiguously a value to the events that initially appeared in different situations with different meanings.
复杂任务通常需要记忆最近发生的事件、事件发生的上下文知识以及我们想要达到的目标。所有这些信息都存储在我们的心理状态中。给定一组心理状态,强化学习 (RL) 算法会预测出能最大化未来奖励的最优策略。RL 算法会为每个已知状态分配一个值,以便发现最优策略就归结为选择导致具有最高值的状态的操作。但是,大脑首先如何创建这些心理状态的表示形式呢?我们提出了一种用于创建包含特定上下文内事件的时间统计信息的心理状态的机制。我们认为心理状态由稳定的回荡活动模式表示,这是神经动力学的吸引子。这些表示是由对特定外部事件(例如感觉刺激)和先前存在的心理状态的特定组合具有选择性的神经元构建而成的。与这一概念一致,我们发现杏仁核和眶额皮层 (OFC) 中的神经元经常表现出这种混合选择性的形式。我们提出,以固定的时间顺序激活不同的混合选择性神经元会改变突触连接,从而将事件和心理状态的组合合并为一个回荡活动的单一模式。这个过程对应于新的、不同的心理状态的诞生,该心理状态编码了不同的时间上下文。凝结过程取决于时间连续性,即事件和心理状态的组合跟随或先于定义特定上下文的事件和状态的概率。上下文所包含的信息使动物能够将最初出现在不同情况下具有不同含义的事件明确地分配一个值。