解构人类的探索算法。

Deconstructing the human algorithms for exploration.

机构信息

Department of Psychology and Center for Brain Science, Harvard University, United States.

出版信息

Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.

DOI:10.1016/j.cognition.2017.12.014

PMID:29289795

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5801139/

Abstract

The dilemma between information gathering (exploration) and reward seeking (exploitation) is a fundamental problem for reinforcement learning agents. How humans resolve this dilemma is still an open question, because experiments have provided equivocal evidence about the underlying algorithms used by humans. We show that two families of algorithms can be distinguished in terms of how uncertainty affects exploration. Algorithms based on uncertainty bonuses predict a change in response bias as a function of uncertainty, whereas algorithms based on sampling predict a change in response slope. Two experiments provide evidence for both bias and slope changes, and computational modeling confirms that a hybrid model is the best quantitative account of the data.

摘要

在信息收集（探索）和奖励寻求（利用）之间的困境是强化学习代理的一个基本问题。人类如何解决这个困境仍然是一个悬而未决的问题，因为实验提供了关于人类使用的潜在算法的模棱两可的证据。我们表明，可以根据不确定性如何影响探索来区分两类算法。基于不确定性奖金的算法预测响应偏差的变化作为不确定性的函数，而基于采样的算法预测响应斜率的变化。两项实验为偏差和斜率变化都提供了证据，计算模型证实混合模型是对数据的最佳定量描述。

相似文献

Deconstructing the human algorithms for exploration.解构人类的探索算法。

Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.

Humans use directed and random exploration to solve the explore-exploit dilemma.人类利用有向探索和随机探索来解决探索与利用的两难困境。

J Exp Psychol Gen. 2014 Dec;143(6):2074-81. doi: 10.1037/a0038199. Epub 2014 Oct 27.

Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation.额部θ波反映了探索和开发过程中的不确定性和意外性。

Cereb Cortex. 2012 Nov;22(11):2575-86. doi: 10.1093/cercor/bhr332. Epub 2011 Nov 25.

Cortical substrates for exploratory decisions in humans.人类探索性决策的皮质基础。

Nature. 2006 Jun 15;441(7095):876-9. doi: 10.1038/nature04766.

Novelty and Inductive Generalization in Human Reinforcement Learning.人类强化学习中的新颖性与归纳概括

Top Cogn Sci. 2015 Jul;7(3):391-415. doi: 10.1111/tops.12138. Epub 2015 Mar 23.

Learning the value of information and reward over time when solving exploration-exploitation problems.随着时间的推移，在解决探索-开发问题时学习信息和奖励的价值。

Sci Rep. 2017 Dec 5;7(1):16919. doi: 10.1038/s41598-017-17237-w.

Dopaminergic genes are associated with both directed and random exploration.多巴胺能基因与定向探索和随机探索都有关联。

Neuropsychologia. 2018 Nov;120:97-104. doi: 10.1016/j.neuropsychologia.2018.10.009. Epub 2018 Oct 19.

Novelty and uncertainty regulate the balance between exploration and exploitation through distinct mechanisms in the human brain.新颖性和不确定性通过人类大脑中的不同机制调节探索与开发之间的平衡。

Neuron. 2022 Aug 17;110(16):2691-2702.e8. doi: 10.1016/j.neuron.2022.05.025. Epub 2022 Jul 8.

Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration-Exploitation Trade-Offs.经颅刺激额极皮层揭示了用于解决探索-利用权衡的选择属性和神经机制。

J Neurosci. 2015 Oct 28;35(43):14544-56. doi: 10.1523/JNEUROSCI.2322-15.2015.

Regulatory fit and systematic exploration in a dynamic decision-making environment.在动态决策环境中进行的调节适配和系统探索。

J Exp Psychol Learn Mem Cogn. 2010 May;36(3):797-804. doi: 10.1037/a0018999.

引用本文的文献

Rate and noise in human amygdala drive increased exploration in aversive learning.人类杏仁核中的速率和噪声驱动在厌恶学习中增加探索。

Nature. 2025 Aug 27. doi: 10.1038/s41586-025-09466-1.

Human Strategy Adaptation in Reinforcement Learning Resembles Policy Gradient Ascent.强化学习中的人类策略适应类似于策略梯度上升。

bioRxiv. 2025 Jul 31:2025.07.28.667308. doi: 10.1101/2025.07.28.667308.

Higher-order and distributed synergistic functional interactions encode information gain in goal-directed learning.高阶和分布式协同功能相互作用在目标导向学习中编码信息增益。

Nat Commun. 2025 Aug 5;16(1):7179. doi: 10.1038/s41467-025-62507-1.

Model-based exploration is measurable across tasks but not linked to personality and psychiatric assessments.基于模型的探索在各项任务中是可测量的，但与人格和精神评估无关。

Sci Rep. 2025 Jul 28;15(1):27479. doi: 10.1038/s41598-025-09152-2.

Estimation-uncertainty affects decisions with and without learning opportunities.估计不确定性会影响有无学习机会情况下的决策。

Nat Commun. 2025 Jul 21;16(1):6706. doi: 10.1038/s41467-025-61960-2.

Deep Learning Improves Parameter Estimation in Reinforcement Learning Models.深度学习改进强化学习模型中的参数估计。

bioRxiv. 2025 Jun 18:2025.03.21.644663. doi: 10.1101/2025.03.21.644663.

Individuals with methamphetamine use disorder show reduced directed exploration and learning rates independent of an aversive interoceptive state induction.患有甲基苯丙胺使用障碍的个体表现出定向探索和学习率降低，且与厌恶的内感受状态诱导无关。

Commun Psychol. 2025 Jun 7;3(1):90. doi: 10.1038/s44271-025-00269-8.

Basal ganglia deep brain stimulation restores cognitive flexibility and exploration-exploitation balance disrupted by NMDA-R antagonism.基底神经节深部脑刺激可恢复因NMDA受体拮抗作用而破坏的认知灵活性和探索-利用平衡。

Nat Commun. 2025 May 28;16(1):4963. doi: 10.1038/s41467-025-60044-5.

TMS-EEG evidence links random exploration to inhibitory mechanisms in the dorsolateral prefrontal cortex.经颅磁刺激-脑电图证据表明，随机探索与背外侧前额叶皮层的抑制机制有关。

Sci Rep. 2025 May 5;15(1):15654. doi: 10.1038/s41598-025-00034-1.

Dynamic prefrontal coupling coordinates adaptive decision-making.动态前额叶耦合协调适应性决策。

Res Sq. 2025 Apr 9:rs.3.rs-6296852. doi: 10.21203/rs.3.rs-6296852/v1.

本文引用的文献

A causal role for right frontopolar cortex in directed, but not random, exploration.右侧额极在定向而非随机探索中起因果作用。

Elife. 2017 Sep 15;6:e27430. doi: 10.7554/eLife.27430.

The effect of atomoxetine on random and directed exploration in humans.托莫西汀对人类随机和定向探索的影响。

PLoS One. 2017 Apr 26;12(4):e0176034. doi: 10.1371/journal.pone.0176034. eCollection 2017.

Charting the expansion of strategic exploratory behavior during adolescence.绘制青春期策略性探索行为的扩展图。

J Exp Psychol Gen. 2017 Feb;146(2):155-164. doi: 10.1037/xge0000250. Epub 2016 Dec 15.

Optimal policy for value-based decision-making.基于价值的决策的最优策略。

Nat Commun. 2016 Aug 18;7:12400. doi: 10.1038/ncomms12400.

Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索

Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.

Discovering hierarchical motion structure.发现层次运动结构。

Vision Res. 2016 Sep;126:232-241. doi: 10.1016/j.visres.2015.03.004. Epub 2015 Mar 26.

Novelty and Inductive Generalization in Human Reinforcement Learning.人类强化学习中的新颖性与归纳概括

Top Cogn Sci. 2015 Jul;7(3):391-415. doi: 10.1111/tops.12138. Epub 2015 Mar 23.

Humans use directed and random exploration to solve the explore-exploit dilemma.人类利用有向探索和随机探索来解决探索与利用的两难困境。

J Exp Psychol Gen. 2014 Dec;143(6):2074-81. doi: 10.1037/a0038199. Epub 2014 Oct 27.

The algorithmic anatomy of model-based evaluation.基于模型评估的算法剖析。

Philos Trans R Soc Lond B Biol Sci. 2014 Nov 5;369(1655). doi: 10.1098/rstb.2013.0478.

Physiological and behavioral signatures of reflective exploratory choice.反思性探索选择的生理和行为特征

Cogn Affect Behav Neurosci. 2014 Dec;14(4):1167-83. doi: 10.3758/s13415-014-0260-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验