• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

习惯、动作序列和强化学习。

Habits, action sequences and reinforcement learning.

机构信息

Brain & Mind Research Institute, University of Sydney, Camperdown, NSW 2050, Australia.

出版信息

Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.

DOI:10.1111/j.1460-9568.2012.08050.x
PMID:22487034
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3325518/
Abstract

It is now widely accepted that instrumental actions can be either goal-directed or habitual; whereas the former are rapidly acquired and regulated by their outcome, the latter are reflexive, elicited by antecedent stimuli rather than their consequences. Model-based reinforcement learning (RL) provides an elegant description of goal-directed action. Through exposure to states, actions and rewards, the agent rapidly constructs a model of the world and can choose an appropriate action based on quite abstract changes in environmental and evaluative demands. This model is powerful but has a problem explaining the development of habitual actions. To account for habits, theorists have argued that another action controller is required, called model-free RL, that does not form a model of the world but rather caches action values within states allowing a state to select an action based on its reward history rather than its consequences. Nevertheless, there are persistent problems with important predictions from the model; most notably the failure of model-free RL correctly to predict the insensitivity of habitual actions to changes in the action-reward contingency. Here, we suggest that introducing model-free RL in instrumental conditioning is unnecessary, and demonstrate that reconceptualizing habits as action sequences allows model-based RL to be applied to both goal-directed and habitual actions in a manner consistent with what real animals do. This approach has significant implications for the way habits are currently investigated and generates new experimental predictions.

摘要

现在人们普遍认为,工具性行为既可以是目标导向的,也可以是习惯性的;前者是通过其结果快速习得和调节的,而后者是反射性的,由先前的刺激引起,而不是由其后果引起的。基于模型的强化学习(RL)为目标导向的行为提供了一个优雅的描述。通过暴露于状态、动作和奖励,代理可以快速构建一个世界模型,并可以根据环境和评价需求的相当抽象的变化来选择适当的动作。这个模型很强大,但它有一个问题,无法解释习惯性动作的发展。为了解释习惯,理论家们认为需要另一个称为无模型 RL 的动作控制器,它不构建世界模型,而是在状态中缓存动作值,允许状态根据其奖励历史而不是其后果来选择动作。然而,从模型中得出的一些重要预测仍然存在问题;最显著的是,无模型 RL 未能正确预测习惯性动作对动作-奖励关联性变化的不敏感性。在这里,我们认为在工具性条件作用中引入无模型 RL 是不必要的,并证明将习惯重新概念化为动作序列可以使基于模型的 RL 以与真实动物一致的方式应用于目标导向和习惯性动作。这种方法对习惯的研究方式具有重要意义,并产生了新的实验预测。

相似文献

1
Habits, action sequences and reinforcement learning.习惯、动作序列和强化学习。
Eur J Neurosci. 2012 Apr;35(7):1036-51. doi: 10.1111/j.1460-9568.2012.08050.x.
2
Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized.动作、动作序列和习惯:目标导向和习惯动作控制呈层级组织的证据。
PLoS Comput Biol. 2013;9(12):e1003364. doi: 10.1371/journal.pcbi.1003364. Epub 2013 Dec 5.
3
Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits.分层行动控制:行动与习惯之间的适应性协作。
Front Psychol. 2019 Dec 11;10:2735. doi: 10.3389/fpsyg.2019.02735. eCollection 2019.
4
Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。
Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.
5
Reward Reinforcement Creates Enduring Facilitation of Goal-directed Behavior.奖励强化会产生持久的目标导向行为促进作用。
J Cogn Neurosci. 2024 Dec 1;36(12):2847-2862. doi: 10.1162/jocn_a_02150.
6
Variable schedules of reinforcement do not reliably distinguish habit from goal-directed behavior.可变强化时间表并不能可靠地将习惯与目标导向行为区分开来。
Neurosci Lett. 2025 Feb 16;849:138132. doi: 10.1016/j.neulet.2025.138132. Epub 2025 Jan 25.
7
Habits as action sequences: hierarchical action control and changes in outcome value.作为动作序列的习惯:分层动作控制与结果价值的变化
Philos Trans R Soc Lond B Biol Sci. 2014 Nov 5;369(1655). doi: 10.1098/rstb.2013.0482.
8
Stress-induced modulation of instrumental behavior: from goal-directed to habitual control of action.应激诱导的工具性行为调节:从目标导向到习惯控制的动作。
Behav Brain Res. 2011 Jun 1;219(2):321-8. doi: 10.1016/j.bbr.2010.12.038. Epub 2011 Jan 8.
9
Optimal habits can develop spontaneously through sensitivity to local cost.最优习惯可以通过对当地成本的敏感性自发形成。
Proc Natl Acad Sci U S A. 2010 Nov 23;107(47):20512-7. doi: 10.1073/pnas.1013470107. Epub 2010 Oct 25.
10
Habits without values.无价值观的习惯。
Psychol Rev. 2019 Mar;126(2):292-311. doi: 10.1037/rev0000120. Epub 2019 Jan 24.

引用本文的文献

1
Brain-wide coordination of decision formation and commitment.决策形成与执行的全脑协调。
bioRxiv. 2025 Aug 7:2024.08.21.609044. doi: 10.1101/2024.08.21.609044.
2
Behavioral inflexibility through overtraining is mediated by reduced mGluR1/5 signaling capacity in the dorsolateral striatum.过度训练导致的行为僵化是由背外侧纹状体中代谢型谷氨酸受体1/5(mGluR1/5)信号传导能力降低介导的。
PLoS Biol. 2025 Jul 29;23(7):e3003288. doi: 10.1371/journal.pbio.3003288. eCollection 2025 Jul.
3
Cognitive computational model reveals repetition bias in a sequential decision-making task.

本文引用的文献

1
From movements to actions: two mechanisms for learning action sequences.从动作到动作:学习动作序列的两种机制。
Cogn Psychol. 2011 Nov;63(3):141-71. doi: 10.1016/j.cogpsych.2011.07.001. Epub 2011 Aug 31.
2
A neural signature of hierarchical reinforcement learning.分层强化学习的神经特征。
Neuron. 2011 Jul 28;71(2):370-9. doi: 10.1016/j.neuron.2011.05.042.
3
Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI.皮质纹状体回路中层次强化学习的机制 2:来自 fMRI 的证据。
认知计算模型揭示了序列决策任务中的重复偏差。
Commun Psychol. 2025 Jun 13;3(1):92. doi: 10.1038/s44271-025-00271-0.
4
Delayed rewards weaken human goal directed actions.延迟奖励会削弱人类的目标导向行为。
NPJ Sci Learn. 2025 Jun 7;10(1):36. doi: 10.1038/s41539-025-00325-2.
5
Female rats retain goal-directed planning of action sequences after acute stress despite changes in planning structure and action sequence execution.雌性大鼠在急性应激后仍保留动作序列的目标导向性规划,尽管规划结构和动作序列执行发生了变化。
Neurobiol Learn Mem. 2025 Jul;220:108063. doi: 10.1016/j.nlm.2025.108063. Epub 2025 May 15.
6
From avoidance to new action: the multifaceted role of the striatal indirect pathway.从回避到新行动:纹状体间接通路的多方面作用。
Nat Rev Neurosci. 2025 May 7. doi: 10.1038/s41583-025-00925-2.
7
Prospective contingency explains behavior and dopamine signals during associative learning.前瞻性偶然性解释了联想学习过程中的行为和多巴胺信号。
Nat Neurosci. 2025 Mar 18. doi: 10.1038/s41593-025-01915-4.
8
Intact habit learning in work addiction: Evidence from a probabilistic sequence learning task.工作成瘾中完整的习惯学习:来自概率序列学习任务的证据。
Addict Behav Rep. 2025 Feb 10;21:100589. doi: 10.1016/j.abrep.2025.100589. eCollection 2025 Jun.
9
Exploring Habits in Anorexia Nervosa: Promise, Pitfalls, and Progress.探索神经性厌食症的习惯:前景、陷阱与进展
Curr Psychiatry Rep. 2025 Apr;27(4):176-186. doi: 10.1007/s11920-025-01588-7. Epub 2025 Feb 28.
10
Maturation of striatal dopamine supports the development of habitual behavior through adolescence.纹状体多巴胺的成熟通过青春期支持习惯性行为的发展。
bioRxiv. 2025 Jan 6:2025.01.06.631527. doi: 10.1101/2025.01.06.631527.
Cereb Cortex. 2012 Mar;22(3):527-36. doi: 10.1093/cercor/bhr117. Epub 2011 Jun 21.
4
Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis.皮质纹状体电路中分层强化学习的机制 1:计算分析。
Cereb Cortex. 2012 Mar;22(3):509-26. doi: 10.1093/cercor/bhr114. Epub 2011 Jun 21.
5
Contributions of dorsal striatal subregions to spatial alternation behavior.背侧纹状体亚区对空间交替行为的贡献。
Learn Mem. 2011 Jun 17;18(7):444-51. doi: 10.1101/lm.2123811. Print 2011 Jul.
6
Speed/accuracy trade-off between the habitual and the goal-directed processes.习惯与目标导向过程之间的速度/准确性权衡。
PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.
7
Studies in spatial learning; place learning versus response learning.空间学习研究;位置学习与反应学习
J Exp Psychol. 1946 Jun;36:221-9. doi: 10.1037/h0060262.
8
Basal ganglia contributions to motor control: a vigorous tutor.基底神经节对运动控制的贡献:一个强有力的指导者。
Curr Opin Neurobiol. 2010 Dec;20(6):704-16. doi: 10.1016/j.conb.2010.08.022. Epub 2010 Sep 17.
9
Operant variability and voluntary action.操作性变异性与自由意志
Psychol Rev. 2010 Jul;117(3):972-93. doi: 10.1037/a0019499.
10
Start/stop signals emerge in nigrostriatal circuits during sequence learning.启动/停止信号在序列学习期间出现在黑质纹状体回路中。
Nature. 2010 Jul 22;466(7305):457-62. doi: 10.1038/nature09263.