神经元水平的预测与噪声可实现灵活的奖赏寻求行为。

Neuron-level Prediction and Noise can Implement Flexible Reward-Seeking Behavior.

作者信息

Li Chenguang, Brenner Jonah, Boesky Adam, Ramanathan Sharad, Kreiman Gabriel

机构信息

Biophysics Program, Harvard College, Cambridge, MA 02138.

Harvard University, Cambridge, MA 02138.

出版信息

bioRxiv. 2024 May 22:2024.05.22.595306. doi: 10.1101/2024.05.22.595306.

DOI:10.1101/2024.05.22.595306

PMID:38826332

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11142161/

Abstract

We show that neural networks can implement reward-seeking behavior using only local predictive updates and internal noise. These networks are capable of autonomous interaction with an environment and can switch between explore and exploit behavior, which we show is governed by attractor dynamics. Networks can adapt to changes in their architectures, environments, or motor interfaces without any external control signals. When networks have a choice between different tasks, they can form preferences that depend on patterns of noise and initialization, and we show that these preferences can be biased by network architectures or by changing learning rates. Our algorithm presents a flexible, biologically plausible way of interacting with environments without requiring an explicit environmental reward function, allowing for behavior that is both highly adaptable and autonomous. Code is available at https://github.com/ccli3896/PaN.

摘要

我们表明，神经网络仅使用局部预测更新和内部噪声就能实现寻求奖励的行为。这些网络能够与环境进行自主交互，并能在探索和利用行为之间切换，我们证明这受吸引子动力学的支配。网络可以在没有任何外部控制信号的情况下适应其架构、环境或运动接口的变化。当网络在不同任务之间进行选择时，它们可以形成依赖于噪声模式和初始化的偏好，并且我们表明这些偏好可能会受到网络架构或学习率变化的影响。我们的算法提出了一种灵活的、生物学上合理的与环境交互的方式，无需明确的环境奖励函数，从而实现高度适应性和自主性的行为。代码可在https://github.com/ccli3896/PaN获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1307/11142161/eb37cac5bf62/nihpp-2024.05.22.595306v1-f0009.jpg

相似文献

Neuron-level Prediction and Noise can Implement Flexible Reward-Seeking Behavior.

bioRxiv. 2024 May 22:2024.05.22.595306. doi: 10.1101/2024.05.22.595306.

A Closed-Loop Toolchain for Neural Network Simulations of Learning Autonomous Agents.

Front Comput Neurosci. 2019 Aug 2;13:46. doi: 10.3389/fncom.2019.00046. eCollection 2019.

Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks.

Elife. 2017 Feb 23;6:e20899. doi: 10.7554/eLife.20899.

Coordinated Prefrontal State Transition Leads Extinction of Reward-Seeking Behaviors.

J Neurosci. 2021 Mar 17;41(11):2406-2419. doi: 10.1523/JNEUROSCI.2588-20.2021. Epub 2021 Feb 2.

MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks.

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3505-3521. doi: 10.1109/TPAMI.2022.3184315. Epub 2023 Feb 3.

Noise-Correlation Is Modulated by Reward Expectation in the Primary Motor Cortex Bilaterally During Manual and Observational Tasks in Primates.

Front Behav Neurosci. 2020 Dec 2;14:541920. doi: 10.3389/fnbeh.2020.541920. eCollection 2020.

A One-Shot Shift from Explore to Exploit in Monkey Prefrontal Cortex.

J Neurosci. 2022 Jan 12;42(2):276-287. doi: 10.1523/JNEUROSCI.1338-21.2021. Epub 2021 Nov 15.

Reward-based training of recurrent neural networks for cognitive and value-based tasks.

Elife. 2017 Jan 13;6:e21492. doi: 10.7554/eLife.21492.

Impaired flexible reward learning in ADHD patients is associated with blunted reinforcement sensitivity and neural signals in ventral striatum and parietal cortex.

Neuroimage Clin. 2024;42:103588. doi: 10.1016/j.nicl.2024.103588. Epub 2024 Mar 1.

Behavioral plasticity through the modulation of switch neurons.

Neural Netw. 2016 Feb;74:35-51. doi: 10.1016/j.neunet.2015.11.001. Epub 2015 Nov 12.

本文引用的文献

Learning probability distributions of sensory inputs with Monte Carlo predictive coding.

PLoS Comput Biol. 2024 Oct 30;20(10):e1012532. doi: 10.1371/journal.pcbi.1012532. eCollection 2024 Oct.

Inferring neural activity before plasticity as a foundation for learning beyond backpropagation.

Nat Neurosci. 2024 Feb;27(2):348-358. doi: 10.1038/s41593-023-01514-1. Epub 2024 Jan 3.

The combination of Hebbian and predictive plasticity learns invariant object representations in deep sensory networks.

Nat Neurosci. 2023 Nov;26(11):1906-1915. doi: 10.1038/s41593-023-01460-y. Epub 2023 Oct 12.

Mesolimbic dopamine adapts the rate of learning from action.

Nature. 2023 Feb;614(7947):294-302. doi: 10.1038/s41586-022-05614-z. Epub 2023 Jan 18.

The emergent properties of the connected brain.

Science. 2022 Nov 4;378(6619):505-510. doi: 10.1126/science.abq2591. Epub 2022 Nov 3.

The Entangled Brain.

J Cogn Neurosci. 2023 Mar 1;35(3):349-360. doi: 10.1162/jocn_a_01908.

Neurons learn by predicting future activity.

Nat Mach Intell. 2022 Jan;4(1):62-72. doi: 10.1038/s42256-021-00430-y. Epub 2022 Jan 25.

Idiosyncratic learning performance in flies.

Biol Lett. 2022 Feb;18(2):20210424. doi: 10.1098/rsbl.2021.0424. Epub 2022 Feb 2.

Canonical neural networks perform active inference.

Commun Biol. 2022 Jan 14;5(1):55. doi: 10.1038/s42003-021-02994-2.

A Technical Critique of Some Parts of the Free Energy Principle.

Entropy (Basel). 2021 Feb 27;23(3):293. doi: 10.3390/e23030293.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

神经元水平的预测与噪声可实现灵活的奖赏寻求行为。

Neuron-level Prediction and Noise can Implement Flexible Reward-Seeking Behavior.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献