Center for Neurocognitive Research (MEG-Center), Moscow State University of Psychology and Education, 29 Sretenka str, Moscow, 127051, Russia.
Cogn Affect Behav Neurosci. 2022 Oct;22(5):1108-1129. doi: 10.3758/s13415-022-00996-z. Epub 2022 Apr 1.
This study examined whether pupil size and response time would distinguish directed exploration from random exploration and exploitation. Eighty-nine participants performed the two-choice probabilistic learning task while their pupil size and response time were continuously recorded. Using LMM analysis, we estimated differences in the pupil size and response time between the advantageous and disadvantageous choices as a function of learning success, i.e., whether or not a participant has learned the probabilistic contingency between choices and their outcomes. We proposed that before a true value of each choice became known to a decision-maker, both advantageous and disadvantageous choices represented a random exploration of the two options with an equally uncertain outcome, whereas the same choices after learning manifested exploitation and direct exploration strategies, respectively. We found that disadvantageous choices were associated with increases both in response time and pupil size, but only after the participants had learned the choice-reward contingencies. For the pupil size, this effect was strongly amplified for those disadvantageous choices that immediately followed gains as compared to losses in the preceding choice. Pupil size modulations were evident during the behavioral choice rather than during the pretrial baseline. These findings suggest that occasional disadvantageous choices, which violate the acquired internal utility model, represent directed exploration. This exploratory strategy shifts choice priorities in favor of information seeking and its autonomic and behavioral concomitants are mainly driven by the conflict between the behavioral plan of the intended exploratory choice and its strong alternative, which has already proven to be more rewarding.
本研究旨在探究瞳孔大小和反应时是否能够区分有方向的探索与随机探索和开发。89 名参与者在执行二择一概率学习任务时,其瞳孔大小和反应时间被连续记录。通过 LMM 分析,我们估计了在学习成功的情况下(即参与者是否已经学习到选择及其结果之间的概率关系),瞳孔大小和反应时间在有利和不利选择之间的差异。我们提出,在决策者真正了解每个选择的价值之前,有利和不利的选择都代表着对两个选项的随机探索,结果同样不确定,而在学习之后,相同的选择表现出了开发和直接探索策略。我们发现,不利的选择与反应时间和瞳孔大小的增加有关,但仅在参与者学习了选择奖励关系之后。对于瞳孔大小,与在前一次选择中损失相比,那些紧随收益而来的不利选择的影响更强烈。瞳孔大小的调制在行为选择期间而不是在预试验基线期间明显。这些发现表明,偶尔的不利选择,违反了已获得的内部效用模型,代表有方向的探索。这种探索策略改变了选择的优先级,有利于信息搜索,其自主和行为伴随主要是由预期探索选择的行为计划与其强烈替代方案之间的冲突驱动的,后者已经被证明更有回报。