时间折扣与定向探索相关，但与随机探索无关。

Temporal discounting correlates with directed exploration but not with random exploration.

机构信息

Department of Psychology, University of Arizona, Tucson, USA.

Department of Psychological Science, Missouri University of Science and Technology, Rolla, USA.

出版信息

Sci Rep. 2020 Mar 4;10(1):4020. doi: 10.1038/s41598-020-60576-4.

DOI:10.1038/s41598-020-60576-4

PMID:32132573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7055215/

Abstract

The explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards - exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less 'temporal discounting' associated with more exploration. By measuring individual differences in temporal discounting and correlating them with explore-exploit behavior, we tested whether this theoretical prediction holds in practice. We used the 27-item Delay-Discounting Questionnaire to estimate temporal discounting and the Horizon Task to quantify two strategies of explore-exploit behavior: directed exploration, where information drives exploration by choice, and random exploration, where behavioral variability drives exploration by chance. We find a clear correlation between temporal discounting and directed exploration, with more temporal discounting leading to less directed exploration. Conversely, we find no relationship between temporal discounting and random exploration. Unexpectedly, we find that the relationship with directed exploration appears to be driven by a correlation between temporal discounting and uncertainty seeking at short time horizons, rather than information seeking at long horizons. Taken together our results suggest a nuanced relationship between temporal discounting and explore-exploit behavior that may be mediated by multiple factors.

摘要

探索-利用困境描述了我们在探索未知选项和利用我们熟知的选项之间必须做出选择时所面临的权衡。这种权衡隐含着我们如何看待未来的奖励——利用通常在短期内更好，但从长期来看，探索的好处可能是巨大的。因此，从理论上讲，人们对未来奖励的重视程度（即他们对未来奖励的贴现程度相对于即时奖励的程度）与他们探索的可能性之间应该有紧密的联系，较少的“时间贴现”与更多的探索相关联。通过测量个体在时间贴现方面的差异，并将其与探索-利用行为相关联，我们检验了这一理论预测在实践中是否成立。我们使用 27 项延迟折扣问卷来估计时间贴现，使用视野任务来量化探索-利用行为的两种策略：有指导的探索，其中信息通过选择驱动探索；随机探索，其中行为变化通过机会驱动探索。我们发现时间贴现与有指导的探索之间存在明显的相关性，时间贴现越多，有指导的探索就越少。相反，我们发现时间贴现与随机探索之间没有关系。出乎意料的是，我们发现与有指导的探索的关系似乎是由时间贴现与短期不确定性寻求之间的相关性驱动的，而不是长期信息寻求。总之，我们的结果表明，时间贴现与探索-利用行为之间存在一种微妙的关系，这种关系可能是由多种因素介导的。