Suppr超能文献

自由能、价值和吸引子。

Free energy, value, and attractors.

机构信息

The Wellcome Trust Centre for Neuroimaging, UCL, Institute of Neurology, 12 Queen Square, London WC1N 3BG, UK.

出版信息

Comput Math Methods Med. 2012;2012:937860. doi: 10.1155/2012/937860. Epub 2011 Dec 21.

Abstract

It has been suggested recently that action and perception can be understood as minimising the free energy of sensory samples. This ensures that agents sample the environment to maximise the evidence for their model of the world, such that exchanges with the environment are predictable and adaptive. However, the free energy account does not invoke reward or cost-functions from reinforcement-learning and optimal control theory. We therefore ask whether reward is necessary to explain adaptive behaviour. The free energy formulation uses ideas from statistical physics to explain action in terms of minimising sensory surprise. Conversely, reinforcement-learning has its roots in behaviourism and engineering and assumes that agents optimise a policy to maximise future reward. This paper tries to connect the two formulations and concludes that optimal policies correspond to empirical priors on the trajectories of hidden environmental states, which compel agents to seek out the (valuable) states they expect to encounter.

摘要

最近有人提出,动作和感知可以被理解为最小化感官样本的自由能。这确保了代理人对环境进行采样,以最大限度地提高其世界模型的证据,从而使与环境的交互是可预测和自适应的。然而,自由能解释并不援引强化学习和最优控制理论中的奖励或成本函数。因此,我们要问奖励是否是解释适应性行为所必需的。自由能公式使用统计物理学的思想来解释动作,即通过最小化感官惊喜来解释动作。相反,强化学习根植于行为主义和工程学,并假设代理人通过优化策略来最大化未来的奖励。本文试图将这两种公式联系起来,并得出结论,最优策略对应于隐藏环境状态轨迹的经验先验,这迫使代理人去寻找他们期望遇到的(有价值的)状态。

相似文献

1
Free energy, value, and attractors.
Comput Math Methods Med. 2012;2012:937860. doi: 10.1155/2012/937860. Epub 2011 Dec 21.
2
Reinforcement learning or active inference?
PLoS One. 2009 Jul 29;4(7):e6421. doi: 10.1371/journal.pone.0006421.
3
Generalised free energy and active inference.
Biol Cybern. 2019 Dec;113(5-6):495-513. doi: 10.1007/s00422-019-00805-w. Epub 2019 Sep 27.
4
Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making.
PLoS Comput Biol. 2021 Jun 3;17(6):e1009070. doi: 10.1371/journal.pcbi.1009070. eCollection 2021 Jun.
5
Free-energy and the brain.
Synthese. 2007 Dec 1;159(3):417-458. doi: 10.1007/s11229-007-9237-y.
6
What is value-accumulated reward or evidence?
Front Neurorobot. 2012 Nov 2;6:11. doi: 10.3389/fnbot.2012.00011. eCollection 2012.
7
Asymmetric and adaptive reward coding via normalized reinforcement learning.
PLoS Comput Biol. 2022 Jul 21;18(7):e1010350. doi: 10.1371/journal.pcbi.1010350. eCollection 2022 Jul.
8
Reward prediction errors, not sensory prediction errors, play a major role in model selection in human reinforcement learning.
Neural Netw. 2022 Oct;154:109-121. doi: 10.1016/j.neunet.2022.07.002. Epub 2022 Jul 13.
9
A free energy principle for the brain.
J Physiol Paris. 2006 Jul-Sep;100(1-3):70-87. doi: 10.1016/j.jphysparis.2006.10.001. Epub 2006 Nov 13.
10
Multi-task reinforcement learning in humans.
Nat Hum Behav. 2021 Jun;5(6):764-773. doi: 10.1038/s41562-020-01035-y. Epub 2021 Jan 28.

引用本文的文献

2
Cortical development in the structural model and free energy minimization.
Cereb Cortex. 2024 Oct 3;34(10). doi: 10.1093/cercor/bhae416.
3
Active Inference in Psychology and Psychiatry: Progress to Date?
Entropy (Basel). 2024 Sep 30;26(10):833. doi: 10.3390/e26100833.
6
Markov Blankets and Mirror Symmetries-Free Energy Minimization and Mesocortical Anatomy.
Entropy (Basel). 2024 Mar 27;26(4):287. doi: 10.3390/e26040287.
7
Predictive coding networks for temporal prediction.
PLoS Comput Biol. 2024 Apr 1;20(4):e1011183. doi: 10.1371/journal.pcbi.1011183. eCollection 2024 Apr.
8
Establishing brain states in neuroimaging data.
PLoS Comput Biol. 2023 Oct 16;19(10):e1011571. doi: 10.1371/journal.pcbi.1011571. eCollection 2023 Oct.
10
The mesoanatomy of the cortex, minimization of free energy, and generative cognition.
Front Comput Neurosci. 2023 May 12;17:1169772. doi: 10.3389/fncom.2023.1169772. eCollection 2023.

本文引用的文献

1
Emerging of Stochastic Dynamical Equalities and Steady State Thermodynamics from Darwinian Dynamics.
Commun Theor Phys. 2008 May 15;49(5):1073-1090. doi: 10.1088/0253-6102/49/5/01.
3
Principles of the self-organizing dynamic system.
J Gen Psychol. 1947 Oct;37(2):125-8. doi: 10.1080/00221309.1947.9918144.
4
Action and behavior: a free-energy formulation.
Biol Cybern. 2010 Mar;102(3):227-60. doi: 10.1007/s00422-010-0364-z. Epub 2010 Feb 11.
5
Entropy demystified the "thermo"-dynamics of stochastically fluctuating systems.
Methods Enzymol. 2009;467:111-134. doi: 10.1016/S0076-6879(09)67005-1.
6
Reinforcement learning or active inference?
PLoS One. 2009 Jul 29;4(7):e6421. doi: 10.1371/journal.pone.0006421.
7
Invariant template matching in systems with spatiotemporal coding: A matter of instability.
Neural Netw. 2009 May;22(4):425-49. doi: 10.1016/j.neunet.2009.01.014. Epub 2009 Feb 10.
8
Global view of bionetwork dynamics: adaptive landscape.
J Genet Genomics. 2009 Feb;36(2):63-73. doi: 10.1016/S1673-8527(08)60093-4.
9
Decision theory, reinforcement learning, and the brain.
Cogn Affect Behav Neurosci. 2008 Dec;8(4):429-53. doi: 10.3758/CABN.8.4.429.
10
Hierarchical models in the brain.
PLoS Comput Biol. 2008 Nov;4(11):e1000211. doi: 10.1371/journal.pcbi.1000211. Epub 2008 Nov 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验