一种用于强化定时控制的神经网络模型。

A neural network model for timing control with reinforcement.

作者信息

Wang Jing, El-Jayyousi Yousuf, Ozden Ilker

机构信息

Department of Biomedical, Biological, and Chemical Engineering, University of Missouri, Columbia, MO, United States.

出版信息

Front Comput Neurosci. 2022 Oct 5;16:918031. doi: 10.3389/fncom.2022.918031. eCollection 2022.

DOI:10.3389/fncom.2022.918031

PMID:36277612

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9579423/

Abstract

How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. Positive or negative feedback was provided after each response based on the timing accuracy. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these two key features of timing variability with an augmented Gaussian process, termed reward-sensitive Gaussian process (RSGP). In a nutshell, the temporal covariance of the timing variable was updated based on the feedback history to recreate the two behavioral characteristics mentioned above. However, the RSGP was mainly descriptive and lacked a neurobiological basis of how the reward feedback can be used by a neural circuit to adjust motor variability. Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks (RNNs). While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating internal variability and external reinforcement in a Bayesian framework. We show that the model can generate the temporal structure of the motor variability as a basis for exploration and exploitation trade-off. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning (RL) in continuous state control.

摘要

当可能性空间无限时，人类和动物如何进行试错学习？在之前的一项研究中，我们使用了间隔计时生成任务，并发现了一种更新策略，即主体会调整行为和神经元噪声以进行探索。在实验中，人类受试者主动生成一系列定时运动输出。每次反应后根据计时准确性提供正反馈或负反馈。我们发现连续运动计时在两个时间尺度上变化：由于记忆漂移导致围绕目标间隔的长期相关性，以及根据反馈对计时变异性进行的短期调整。我们之前用一个增强高斯过程描述了计时变异性的这两个关键特征，称为奖励敏感高斯过程（RSGP）。简而言之，计时变量的时间协方差根据反馈历史进行更新，以重现上述两种行为特征。然而，RSGP主要是描述性的，缺乏神经生物学基础来解释神经回路如何利用奖励反馈来调整运动变异性。在这里，我们提供了一个机械模型，并通过借鉴循环神经网络（RNN）的架构来模拟这个过程。虽然循环连接提供了运动计时中的长期序列相关性，但为了促进奖励驱动的短期变化，我们受大脑中突触传递的随机性启发，在网络连接中引入了奖励依赖的变异性。我们的模型能够在贝叶斯框架中递归地生成一个包含内部变异性和外部强化的输出序列。我们表明，该模型可以生成运动变异性的时间结构，作为探索与利用权衡的基础。与其他神经网络模型不同，其他模型寻找唯一的网络连接以实现模型预测与观察之间的最佳匹配，而该模型可以估计与每个结果相关的不确定性，因此在区分可调整的任务相关变异性和无法解释的变异性方面做得更好。所提出的人工神经网络模型与神经系统中的信息处理机制相似，并且可以扩展连续状态控制中受大脑启发的强化学习（RL）框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cad/9579423/05029ec72002/fncom-16-918031-g001.jpg

相似文献

A neural network model for timing control with reinforcement.

Front Comput Neurosci. 2022 Oct 5;16:918031. doi: 10.3389/fncom.2022.918031. eCollection 2022.

Somatic and Reinforcement-Based Plasticity in the Initial Stages of Human Motor Learning.

J Neurosci. 2016 Nov 16;36(46):11682-11692. doi: 10.1523/JNEUROSCI.1767-16.2016.

Modeling Interval Timing by Recurrent Neural Nets.

Front Integr Neurosci. 2019 Aug 28;13:46. doi: 10.3389/fnint.2019.00046. eCollection 2019.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex.

PLoS One. 2012;7(10):e47251. doi: 10.1371/journal.pone.0047251. Epub 2012 Oct 19.

A Kernel Reinforcement Learning Decoding Framework Integrating Neural and Feedback Signals for Brain Control.

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340203.

Exploration in neo-Hebbian reinforcement learning: Computational approaches to the exploration-exploitation balance with bio-inspired neural networks.

Neural Netw. 2022 Jul;151:16-33. doi: 10.1016/j.neunet.2022.03.021. Epub 2022 Mar 23.

A neural network model of adaptively timed reinforcement learning and hippocampal dynamics.

Brain Res Cogn Brain Res. 1992 Jun;1(1):3-38. doi: 10.1016/0926-6410(92)90003-a.

Oxytocin attenuates trust as a subset of more general reinforcement learning, with altered reward circuit functional connectivity in males.

Neuroimage. 2018 Jul 1;174:35-43. doi: 10.1016/j.neuroimage.2018.02.035. Epub 2018 Feb 25.

Reinforcement Learning for Central Pattern Generation in Dynamical Recurrent Neural Networks.

Front Comput Neurosci. 2022 Apr 8;16:818985. doi: 10.3389/fncom.2022.818985. eCollection 2022.

引用本文的文献

Cingulate and striatal hubs are linked to early skill learning.

bioRxiv. 2024 Nov 21:2024.11.20.624544. doi: 10.1101/2024.11.20.624544.

本文引用的文献

Multikernel Capsule Network for Schizophrenia Identification.

IEEE Trans Cybern. 2022 Jun;52(6):4741-4750. doi: 10.1109/TCYB.2020.3035282. Epub 2022 Jun 16.

Reinforcement regulates timing variability in thalamus.

Elife. 2020 Dec 1;9:e55872. doi: 10.7554/eLife.55872.

The neurobiology of deep reinforcement learning.

Curr Biol. 2020 Jun 8;30(11):R629-R632. doi: 10.1016/j.cub.2020.04.021.

Adaptive Regulation of Motor Variability.

Curr Biol. 2019 Nov 4;29(21):3551-3562.e7. doi: 10.1016/j.cub.2019.08.052. Epub 2019 Oct 17.

Computational roles of plastic probabilistic synapses.

Curr Opin Neurobiol. 2019 Feb;54:90-97. doi: 10.1016/j.conb.2018.09.002. Epub 2018 Oct 8.

Learning and attention reveal a general relationship between population activity and behavior.

Science. 2018 Jan 26;359(6374):463-465. doi: 10.1126/science.aao0284.

Distinct Sources of Deterministic and Stochastic Components of Action Timing Decisions in Rodent Frontal Cortex.

Neuron. 2017 May 17;94(4):908-919.e7. doi: 10.1016/j.neuron.2017.04.040.

The Role of Variability in Motor Learning.

Annu Rev Neurosci. 2017 Jul 25;40:479-498. doi: 10.1146/annurev-neuro-072116-031548. Epub 2017 May 10.

Reward-dependent modulation of movement variability.

J Neurosci. 2015 Mar 4;35(9):4015-24. doi: 10.1523/JNEUROSCI.3244-14.2015.

Attention can either increase or decrease spike count correlations in visual cortex.

Nat Neurosci. 2014 Nov;17(11):1591-7. doi: 10.1038/nn.3835. Epub 2014 Oct 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于强化定时控制的神经网络模型。

A neural network model for timing control with reinforcement.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献