Li Chenguang, Brenner Jonah, Boesky Adam, Ramanathan Sharad, Kreiman Gabriel
Biophysics Program, Harvard College, Cambridge, MA 02138.
Harvard University, Cambridge, MA 02138.
bioRxiv. 2024 May 22:2024.05.22.595306. doi: 10.1101/2024.05.22.595306.
We show that neural networks can implement reward-seeking behavior using only local predictive updates and internal noise. These networks are capable of autonomous interaction with an environment and can switch between explore and exploit behavior, which we show is governed by attractor dynamics. Networks can adapt to changes in their architectures, environments, or motor interfaces without any external control signals. When networks have a choice between different tasks, they can form preferences that depend on patterns of noise and initialization, and we show that these preferences can be biased by network architectures or by changing learning rates. Our algorithm presents a flexible, biologically plausible way of interacting with environments without requiring an explicit environmental reward function, allowing for behavior that is both highly adaptable and autonomous. Code is available at https://github.com/ccli3896/PaN.
我们表明,神经网络仅使用局部预测更新和内部噪声就能实现寻求奖励的行为。这些网络能够与环境进行自主交互,并能在探索和利用行为之间切换,我们证明这受吸引子动力学的支配。网络可以在没有任何外部控制信号的情况下适应其架构、环境或运动接口的变化。当网络在不同任务之间进行选择时,它们可以形成依赖于噪声模式和初始化的偏好,并且我们表明这些偏好可能会受到网络架构或学习率变化的影响。我们的算法提出了一种灵活的、生物学上合理的与环境交互的方式,无需明确的环境奖励函数,从而实现高度适应性和自主性的行为。代码可在https://github.com/ccli3896/PaN获取。