将匪帮置于情境中：功能学习如何支持决策制定。

Putting bandits into context: How function learning supports decision making.

作者信息

Schulz Eric, Konstantinidis Emmanouil, Speekenbrink Maarten

机构信息

Department of Experimental Psychology.

School of Psychology, University of New South Wales.

出版信息

J Exp Psychol Learn Mem Cogn. 2018 Jun;44(6):927-943. doi: 10.1037/xlm0000463. Epub 2017 Nov 13.

DOI:10.1037/xlm0000463

PMID:29130693

Abstract

The authors introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximize their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the functional relation between contexts and rewards and improve their decision strategy over time. In three experiments, the authors explore participants' behavior in such learning environments. They predict participants' behavior by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process and linear regression) learning approaches combined with different choice strategies. Participants are mostly able to learn about the context-reward functions and their behavior is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances. In a relatively simple task with binary features, they seem to combine this learning with a probability of improvement decision strategy which focuses on alternatives that are expected to lead to an improvement upon a current favorite option. In a task with continuous features that are linearly related to the rewards, participants seem to more explicitly balance exploration and exploitation. Finally, in a difficult learning environment where the relation between features and rewards is nonlinear, some participants are again well-described by a Gaussian process learning strategy, whereas others revert to context-blind strategies. (PsycINFO Database Record

摘要

作者引入上下文多臂赌博机任务作为一个框架，用于研究不确定环境中的学习和决策。在这个新颖的范式中，参与者在多个选项之间反复进行选择，以最大化他们的奖励。这些选项由一些上下文特征来描述，这些特征通过最初未知的函数来预测奖励。通过选择选项并观察其决策后果的经验，参与者可以了解上下文与奖励之间的函数关系，并随着时间的推移改进他们的决策策略。在三个实验中，作者探究了参与者在这种学习环境中的行为。他们通过结合不同选择策略的上下文盲（均值跟踪、卡尔曼滤波器）和上下文（高斯过程和线性回归）学习方法来预测参与者的行为。参与者大多能够了解上下文 - 奖励函数，并且他们的行为最好用高斯过程学习策略来描述，该策略将先前的经验推广到类似的情况。在一个具有二元特征的相对简单的任务中，他们似乎将这种学习与改进概率决策策略相结合，该策略关注那些有望比当前最喜欢的选项带来改进的替代方案。在一个具有与奖励线性相关的连续特征的任务中，参与者似乎更明确地平衡探索和利用。最后，在一个特征与奖励之间的关系是非线性的困难学习环境中，一些参与者再次能用高斯过程学习策略很好地描述，而另一些参与者则回归到上下文盲策略。（PsycINFO数据库记录）

相似文献

Putting bandits into context: How function learning supports decision making.将匪帮置于情境中：功能学习如何支持决策制定。

J Exp Psychol Learn Mem Cogn. 2018 Jun;44(6):927-943. doi: 10.1037/xlm0000463. Epub 2017 Nov 13.

Finding structure in multi-armed bandits.在多臂老虎机中寻找结构。

Cogn Psychol. 2020 Jun;119:101261. doi: 10.1016/j.cogpsych.2019.101261. Epub 2020 Feb 12.

Skilled bandits: Learning to choose in a reactive world.熟练的劫匪：在反应性世界中学会选择。

J Exp Psychol Learn Mem Cogn. 2021 Jun;47(6):879-905. doi: 10.1037/xlm0000981. Epub 2020 Nov 30.

It's new, but is it good? How generalization and uncertainty guide the exploration of novel options.这是新的，但它好吗？概括和不确定性如何指导对新选项的探索。

J Exp Psychol Gen. 2020 Oct;149(10):1878-1907. doi: 10.1037/xge0000749. Epub 2020 Mar 19.

Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索

Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.

Generalization and Search in Risky Environments.风险环境中的泛化与搜索。

Cogn Sci. 2018 Nov;42(8):2592-2620. doi: 10.1111/cogs.12695. Epub 2018 Nov 2.

Sex differences in learning from exploration.从探索中学习的性别差异。

Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?基于筛沙机制的超越方法：为何乐观值函数能在多臂老虎机问题中找到最优解？

Biosystems. 2015 Sep;135:55-65. doi: 10.1016/j.biosystems.2015.06.009. Epub 2015 Jul 10.

Humans adaptively resolve the explore-exploit dilemma under cognitive constraints: Evidence from a multi-armed bandit task.人类在认知限制下适应性地解决探索-利用困境：来自多臂赌博机任务的证据。

Cognition. 2022 Dec;229:105233. doi: 10.1016/j.cognition.2022.105233. Epub 2022 Jul 30.

Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration-Exploitation Trade-Offs.经颅刺激额极皮层揭示了用于解决探索-利用权衡的选择属性和神经机制。

J Neurosci. 2015 Oct 28;35(43):14544-56. doi: 10.1523/JNEUROSCI.2322-15.2015.

引用本文的文献

Reconciling shared versus context-specific information in a neural network model of latent causes.在潜在因果关系的神经网络模型中协调共享信息和特定上下文信息。

Sci Rep. 2024 Jul 22;14(1):16782. doi: 10.1038/s41598-024-64272-5.

Variability and harshness shape flexible strategy-use in support of the constrained flexibility framework.变异性和苛刻性塑造了灵活的策略使用，以支持约束灵活性框架。

Sci Rep. 2024 Mar 27;14(1):7236. doi: 10.1038/s41598-024-57800-w.

Interaction between decision-making and motor learning when selecting reach targets in the presence of bias and noise.在存在偏差和噪声的情况下选择目标时，决策与运动学习之间的相互作用。

PLoS Comput Biol. 2023 Nov 2;19(11):e1011596. doi: 10.1371/journal.pcbi.1011596. eCollection 2023 Nov.

Signal detection models as contextual bandits.作为上下文博弈的信号检测模型

R Soc Open Sci. 2023 Jun 21;10(6):230157. doi: 10.1098/rsos.230157. eCollection 2023 Jun.

Humans combine value learning and hypothesis testing strategically in multi-dimensional probabilistic reward learning.人类在多维概率奖励学习中策略性地结合价值学习和假设检验。

PLoS Comput Biol. 2022 Nov 23;18(11):e1010699. doi: 10.1371/journal.pcbi.1010699. eCollection 2022 Nov.

Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T.状态和动作关联或区分泛化的强化学习：3T 和 7T 的 fMRI。

Hum Brain Mapp. 2022 Oct 15;43(15):4750-4790. doi: 10.1002/hbm.25988. Epub 2022 Jul 21.

Exploration heuristics decrease during youth.探索启发在年轻时会减少。

Cogn Affect Behav Neurosci. 2022 Oct;22(5):969-983. doi: 10.3758/s13415-022-01009-9. Epub 2022 May 19.

Patterns of choice adaptation in dynamic risky environments.动态风险环境中的选择适应模式。

Mem Cognit. 2022 May;50(4):864-881. doi: 10.3758/s13421-021-01244-4. Epub 2022 Mar 8.

Strategy Development and Feedback Processing During Complex Category Learning.复杂类别学习中的策略发展与反馈处理

Front Psychol. 2021 Nov 10;12:672330. doi: 10.3389/fpsyg.2021.672330. eCollection 2021.

Priming exploration across domains: does search in a spatial environment influence search in a cognitive environment?跨领域启动探索：在空间环境中的搜索会影响在认知环境中的搜索吗？

R Soc Open Sci. 2021 Aug 18;8(8):201944. doi: 10.1098/rsos.201944. eCollection 2021 Aug.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将匪帮置于情境中：功能学习如何支持决策制定。

Putting bandits into context: How function learning supports decision making.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献