Suppr超能文献

一种用于强化定时控制的神经网络模型。

A neural network model for timing control with reinforcement.

作者信息

Wang Jing, El-Jayyousi Yousuf, Ozden Ilker

机构信息

Department of Biomedical, Biological, and Chemical Engineering, University of Missouri, Columbia, MO, United States.

出版信息

Front Comput Neurosci. 2022 Oct 5;16:918031. doi: 10.3389/fncom.2022.918031. eCollection 2022.

Abstract

How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. Positive or negative feedback was provided after each response based on the timing accuracy. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these two key features of timing variability with an augmented Gaussian process, termed reward-sensitive Gaussian process (RSGP). In a nutshell, the temporal covariance of the timing variable was updated based on the feedback history to recreate the two behavioral characteristics mentioned above. However, the RSGP was mainly descriptive and lacked a neurobiological basis of how the reward feedback can be used by a neural circuit to adjust motor variability. Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks (RNNs). While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating internal variability and external reinforcement in a Bayesian framework. We show that the model can generate the temporal structure of the motor variability as a basis for exploration and exploitation trade-off. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning (RL) in continuous state control.

摘要

当可能性空间无限时,人类和动物如何进行试错学习?在之前的一项研究中,我们使用了间隔计时生成任务,并发现了一种更新策略,即主体会调整行为和神经元噪声以进行探索。在实验中,人类受试者主动生成一系列定时运动输出。每次反应后根据计时准确性提供正反馈或负反馈。我们发现连续运动计时在两个时间尺度上变化:由于记忆漂移导致围绕目标间隔的长期相关性,以及根据反馈对计时变异性进行的短期调整。我们之前用一个增强高斯过程描述了计时变异性的这两个关键特征,称为奖励敏感高斯过程(RSGP)。简而言之,计时变量的时间协方差根据反馈历史进行更新,以重现上述两种行为特征。然而,RSGP主要是描述性的,缺乏神经生物学基础来解释神经回路如何利用奖励反馈来调整运动变异性。在这里,我们提供了一个机械模型,并通过借鉴循环神经网络(RNN)的架构来模拟这个过程。虽然循环连接提供了运动计时中的长期序列相关性,但为了促进奖励驱动的短期变化,我们受大脑中突触传递的随机性启发,在网络连接中引入了奖励依赖的变异性。我们的模型能够在贝叶斯框架中递归地生成一个包含内部变异性和外部强化的输出序列。我们表明,该模型可以生成运动变异性的时间结构,作为探索与利用权衡的基础。与其他神经网络模型不同,其他模型寻找唯一的网络连接以实现模型预测与观察之间的最佳匹配,而该模型可以估计与每个结果相关的不确定性,因此在区分可调整的任务相关变异性和无法解释的变异性方面做得更好。所提出的人工神经网络模型与神经系统中的信息处理机制相似,并且可以扩展连续状态控制中受大脑启发的强化学习(RL)框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cad/9579423/05029ec72002/fncom-16-918031-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验