探索和近期性是概率匹配的主要近因：强化学习分析。

Exploration and recency as the main proximate causes of probability matching: a reinforcement learning analysis.

机构信息

Department of General Physics, Institute of Physics, University of São Paulo, Rua do Matão Nr. 1371, Cidade Universitária, CEP 05508-090, São Paulo, SP, Brazil.

Department of Physiology and Biophysics, Institute of Biomedical Sciences, University of São Paulo, Av. Prof. Lineu Prestes, 1524, ICB-I, Cidade Universitária, CEP 05508-000, São Paulo, SP, Brazil.

出版信息

Sci Rep. 2017 Nov 10;7(1):15326. doi: 10.1038/s41598-017-15587-z.

DOI:10.1038/s41598-017-15587-z

PMID:29127418

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5681695/

Abstract

Research has not yet reached a consensus on why humans match probabilities instead of maximise in a probability learning task. The most influential explanation is that they search for patterns in the random sequence of outcomes. Other explanations, such as expectation matching, are plausible, but do not consider how reinforcement learning shapes people's choices. We aimed to quantify how human performance in a probability learning task is affected by pattern search and reinforcement learning. We collected behavioural data from 84 young adult participants who performed a probability learning task wherein the majority outcome was rewarded with 0.7 probability, and analysed the data using a reinforcement learning model that searches for patterns. Model simulations indicated that pattern search, exploration, recency (discounting early experiences), and forgetting may impair performance. Our analysis estimated that 85% (95% HDI [76, 94]) of participants searched for patterns and believed that each trial outcome depended on one or two previous ones. The estimated impact of pattern search on performance was, however, only 6%, while those of exploration and recency were 19% and 13% respectively. This suggests that probability matching is caused by uncertainty about how outcomes are generated, which leads to pattern search, exploration, and recency.

摘要

关于为什么人类在概率学习任务中匹配概率而不是最大化概率，研究尚未达成共识。最有影响力的解释是，他们在随机结果序列中寻找模式。其他解释，如期望匹配，也是合理的，但没有考虑强化学习如何塑造人们的选择。我们旨在量化模式搜索和强化学习如何影响人们在概率学习任务中的表现。我们从 84 名年轻成年参与者那里收集了行为数据，他们在概率学习任务中表现出色，其中多数结果以 0.7 的概率得到奖励，并使用搜索模式的强化学习模型对数据进行了分析。模型模拟表明，模式搜索、探索、近期（折扣早期经验）和遗忘可能会损害表现。我们的分析估计，85%（95% HDI [76, 94]）的参与者在寻找模式，并认为每个试验结果取决于一两个先前的结果。然而，模式搜索对性能的估计影响仅为 6%，而探索和近期的影响分别为 19%和 13%。这表明概率匹配是由对结果产生方式的不确定性引起的，这导致了模式搜索、探索和近期。