噪声中的学习：可变环境下的动态决策

Learning in Noise: Dynamic Decision-Making in a Variable Environment.

作者信息

Gureckis Todd M, Love Bradley C

机构信息

New York University.

出版信息

J Math Psychol. 2009 Jun;53(3):180-193. doi: 10.1016/j.jmp.2009.02.004.

DOI:10.1016/j.jmp.2009.02.004

PMID:20161328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2678746/

Abstract

In engineering systems, noise is a curse, obscuring important signals and increasing the uncertainty associated with measurement. However, the negative effects of noise and uncertainty are not universal. In this paper, we examine how people learn sequential control strategies given different sources and amounts of feedback variability. In particular, we consider people's behavior in a task where short- and long-term rewards are placed in conflict (i.e., the best option in the short-term is worst in the long-term). Consistent with a model based on reinforcement learning principles (Gureckis & Love, in press), we find that learners differentially weight information predictive of the current task state. In particular, when cues that signal state are noisy and uncertain, we find that participants' ability to identify an optimal strategy is strongly impaired relative to equivalent amounts of uncertainty that obscure the rewards/valuations of those states. In other situations, we find that noise and uncertainty in reward signals may paradoxically improve performance by encouraging exploration. Our results demonstrate how experimentally-manipulated task variability can be used to test predictions about the mechanisms that learners engage in dynamic decision making tasks.

摘要

在工程系统中，噪声是一种祸根，它会掩盖重要信号并增加与测量相关的不确定性。然而，噪声和不确定性的负面影响并非普遍存在。在本文中，我们研究了人们在面对不同来源和数量的反馈变异性时如何学习顺序控制策略。具体而言，我们考虑人们在一项任务中的行为，在该任务中短期和长期奖励存在冲突（即短期的最佳选择在长期来看是最差的）。与基于强化学习原理的模型（Gureckis & Love，即将出版）一致，我们发现学习者会对预测当前任务状态的信息进行不同程度的加权。特别是，当指示状态的线索存在噪声且不确定时，我们发现与掩盖这些状态的奖励/价值的同等程度的不确定性相比，参与者识别最优策略的能力会受到严重损害。在其他情况下，我们发现奖励信号中的噪声和不确定性可能会通过鼓励探索而反常地提高表现。我们的结果表明，通过实验操纵的任务变异性可用于检验关于学习者在动态决策任务中所采用机制的预测。

相似文献

Learning in Noise: Dynamic Decision-Making in a Variable Environment.噪声中的学习：可变环境下的动态决策

J Math Psychol. 2009 Jun;53(3):180-193. doi: 10.1016/j.jmp.2009.02.004.

Short-term gains, long-term pains: how cues about state aid learning in dynamic environments.短期收益，长期痛苦：动态环境中关于国家援助学习的线索是怎样的。

Cognition. 2009 Dec;113(3):293-313. doi: 10.1016/j.cognition.2009.03.013. Epub 2009 May 8.

It's new, but is it good? How generalization and uncertainty guide the exploration of novel options.这是新的，但它好吗？概括和不确定性如何指导对新选项的探索。

J Exp Psychol Gen. 2020 Oct;149(10):1878-1907. doi: 10.1037/xge0000749. Epub 2020 Mar 19.

Working-memory load and temporal myopia in dynamic decision making.工作记忆负荷与动态决策中的时间近视。

J Exp Psychol Learn Mem Cogn. 2012 Nov;38(6):1640-58. doi: 10.1037/a0028146. Epub 2012 Apr 30.

Sex differences in learning from exploration.从探索中学习的性别差异。

Elife. 2021 Nov 19;10:e69748. doi: 10.7554/eLife.69748.

Cognitive mechanisms of learning in sequential decision-making under uncertainty: an experimental and theoretical approach.不确定性下序列决策中学习的认知机制：一种实验与理论方法

Front Behav Neurosci. 2024 Aug 12;18:1399394. doi: 10.3389/fnbeh.2024.1399394. eCollection 2024.

Putting bandits into context: How function learning supports decision making.将匪帮置于情境中：功能学习如何支持决策制定。

J Exp Psychol Learn Mem Cogn. 2018 Jun;44(6):927-943. doi: 10.1037/xlm0000463. Epub 2017 Nov 13.

How environmental regularities affect people's information search in probability judgments from experience.环境规律如何影响人们在基于经验的概率判断中的信息搜索。

J Exp Psychol Learn Mem Cogn. 2019 Feb;45(2):219-231. doi: 10.1037/xlm0000572. Epub 2018 Jul 19.

Learning and choosing in an uncertain world: An investigation of the explore-exploit dilemma in static and dynamic environments.在不确定的世界中学习与选择：对静态和动态环境中探索-利用困境的研究。

Cogn Psychol. 2016 Mar;85:43-77. doi: 10.1016/j.cogpsych.2016.01.001. Epub 2016 Jan 21.

"Simultaneously Vague and Oddly Specific": Understanding Autistic People's Experiences of Decision Making and Research Questionnaires.“既模糊又出奇地具体”：理解自闭症患者的决策体验与研究问卷

Autism Adulthood. 2023 Sep 1;5(3):263-274. doi: 10.1089/aut.2022.0039. Epub 2023 Aug 30.

引用本文的文献

A behavioral dataset of predictive decisions given trends in information across adulthood.一个关于成年期信息趋势下预测性决策的行为数据集。

Data Brief. 2024 Aug 10;56:110832. doi: 10.1016/j.dib.2024.110832. eCollection 2024 Oct.

A dose-ranging study of the physiological and self-reported effects of repeated, rapid infusion of remifentanil in people with opioid use disorder and physical dependence on fentanyl.一项重复、快速输注瑞芬太尼对阿片类药物使用障碍和身体依赖芬太尼者生理和自我报告影响的剂量范围研究。

Psychopharmacology (Berl). 2024 Jun;241(6):1227-1236. doi: 10.1007/s00213-024-06557-1. Epub 2024 Feb 22.

Cognitive profile in Restless Legs Syndrome: A signal-to-noise ratio account.不宁腿综合征的认知概况：基于信噪比的解释

Curr Res Neurobiol. 2021 Aug 8;2:100021. doi: 10.1016/j.crneur.2021.100021. eCollection 2021.

Value-free random exploration is linked to impulsivity.无价值的随机探索与冲动有关。

Nat Commun. 2022 Aug 4;13(1):4542. doi: 10.1038/s41467-022-31918-9.

Focusing on cognitive potential as the bright side of mental atypicality.关注认知潜能，以展现精神非典型性的积极面。

Commun Biol. 2022 Mar 1;5(1):188. doi: 10.1038/s42003-022-03126-0.

The Downsides of Cognitive Enhancement.认知增强的弊端

Neuroscientist. 2021 Aug;27(4):322-330. doi: 10.1177/1073858420945971. Epub 2020 Jul 30.

Approaches to Cognitive Modeling in Dynamic Systems Control.动态系统控制中的认知建模方法。

Front Psychol. 2017 Nov 29;8:2032. doi: 10.3389/fpsyg.2017.02032. eCollection 2017.

Ostracism Reduces Reliance on Poor Advice from Others during Decision Making.排斥会减少决策过程中对他人不当建议的依赖。

J Behav Decis Mak. 2016 Oct;29(4):409-418. doi: 10.1002/bdm.1886. Epub 2015 Jun 12.

Who Chokes Under Pressure? The Big Five Personality Traits and Decision-Making under Pressure.谁在压力下会窒息？大五人格特质与压力下的决策

Pers Individ Dif. 2015 Feb;74:22-28. doi: 10.1016/j.paid.2014.10.009. Epub 2014 Oct 23.

Global Cue Inconsistency Diminishes Learning of Cue Validity.全局线索不一致会削弱线索有效性的学习。

Front Psychol. 2016 Nov 11;7:1743. doi: 10.3389/fpsyg.2016.01743. eCollection 2016.

本文引用的文献

Short-term gains, long-term pains: how cues about state aid learning in dynamic environments.短期收益，长期痛苦：动态环境中关于国家援助学习的线索是怎样的。

Cognition. 2009 Dec;113(3):293-313. doi: 10.1016/j.cognition.2009.03.013. Epub 2009 May 8.

Bayesian approaches to associative learning: from passive to active learning.贝叶斯关联学习方法：从被动学习到主动学习。

Learn Behav. 2008 Aug;36(3):210-26. doi: 10.3758/lb.36.3.210.

Regulatory fit effects in a choice task.选择任务中的调节匹配效应。

Psychon Bull Rev. 2007 Dec;14(6):1125-32. doi: 10.3758/bf03193101.

Optimization by simulated annealing.模拟退火优化。

Science. 1983 May 13;220(4598):671-80. doi: 10.1126/science.220.4598.671.

Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.协调强化学习模型与行为消退及恢复：对成瘾、复发和问题赌博的启示。

Psychol Rev. 2007 Jul;114(3):784-805. doi: 10.1037/0033-295X.114.3.784.

Short-term memory traces for action bias in human reinforcement learning.人类强化学习中动作偏差的短期记忆痕迹

Brain Res. 2007 Jun 11;1153:111-21. doi: 10.1016/j.brainres.2007.03.057. Epub 2007 Mar 24.

A test of the regulatory fit hypothesis in perceptual classification learning.知觉分类学习中调节匹配假设的一项测试。

Mem Cognit. 2006 Oct;34(7):1377-97. doi: 10.3758/bf03195904.

Discounting of delayed rewards: Models of individual choice.延迟奖励折扣：个体选择模型。

J Exp Anal Behav. 1995 Nov;64(3):263-76. doi: 10.1901/jeab.1995.64-263.

Cortical substrates for exploratory decisions in humans.人类探索性决策的皮质基础。

Nature. 2006 Jun 15;441(7095):876-9. doi: 10.1038/nature04766.

From recurrent choice to skill learning: a reinforcement-learning model.从反复选择到技能学习：一种强化学习模型

J Exp Psychol Gen. 2006 May;135(2):184-206. doi: 10.1037/0096-3445.135.2.184.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验