Gureckis Todd M, Love Bradley C
New York University.
J Math Psychol. 2009 Jun;53(3):180-193. doi: 10.1016/j.jmp.2009.02.004.
In engineering systems, noise is a curse, obscuring important signals and increasing the uncertainty associated with measurement. However, the negative effects of noise and uncertainty are not universal. In this paper, we examine how people learn sequential control strategies given different sources and amounts of feedback variability. In particular, we consider people's behavior in a task where short- and long-term rewards are placed in conflict (i.e., the best option in the short-term is worst in the long-term). Consistent with a model based on reinforcement learning principles (Gureckis & Love, in press), we find that learners differentially weight information predictive of the current task state. In particular, when cues that signal state are noisy and uncertain, we find that participants' ability to identify an optimal strategy is strongly impaired relative to equivalent amounts of uncertainty that obscure the rewards/valuations of those states. In other situations, we find that noise and uncertainty in reward signals may paradoxically improve performance by encouraging exploration. Our results demonstrate how experimentally-manipulated task variability can be used to test predictions about the mechanisms that learners engage in dynamic decision making tasks.
在工程系统中,噪声是一种祸根,它会掩盖重要信号并增加与测量相关的不确定性。然而,噪声和不确定性的负面影响并非普遍存在。在本文中,我们研究了人们在面对不同来源和数量的反馈变异性时如何学习顺序控制策略。具体而言,我们考虑人们在一项任务中的行为,在该任务中短期和长期奖励存在冲突(即短期的最佳选择在长期来看是最差的)。与基于强化学习原理的模型(Gureckis & Love,即将出版)一致,我们发现学习者会对预测当前任务状态的信息进行不同程度的加权。特别是,当指示状态的线索存在噪声且不确定时,我们发现与掩盖这些状态的奖励/价值的同等程度的不确定性相比,参与者识别最优策略的能力会受到严重损害。在其他情况下,我们发现奖励信号中的噪声和不确定性可能会通过鼓励探索而反常地提高表现。我们的结果表明,通过实验操纵的任务变异性可用于检验关于学习者在动态决策任务中所采用机制的预测。