Systems Neurobiology Laboratories, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA.
Exp Brain Res. 2008 May;187(2):321-30. doi: 10.1007/s00221-008-1306-z. Epub 2008 Mar 11.
Markov chains (stochastic processes where probabilities are assigned based on the previous outcome) are commonly used to examine the transitions between behavioral states, such as those that occur during foraging or social interactions. However, relatively little is known about how well primates can incorporate knowledge about Markov chains into their behavior. Saccadic eye movements are an example of a simple behavior influenced by information about probability, and thus are good candidates for testing whether subjects can learn Markov chains. In addition, when investigating the influence of probability on saccade target selection, the use of Markov chains could provide an alternative method that avoids confounds present in other task designs. To investigate these possibilities, we evaluated human behavior on a task in which stimulus reward probabilities were assigned using a Markov chain. On each trial, the subject selected one of four identical stimuli by saccade; after selection, feedback indicated the rewarded stimulus. Each session consisted of 200-600 trials, and on some sessions, the reward magnitude varied. On sessions with a uniform reward, subjects (n = 6) learned to select stimuli at a frequency close to reward probability, which is similar to human behavior on matching or probability classification tasks. When informed that a Markov chain assigned reward probabilities, subjects (n = 3) learned to select the greatest reward probability more often, bringing them close to behavior that maximizes reward. On sessions where reward magnitude varied across stimuli, subjects (n = 6) demonstrated preferences for both greater reward probability and greater reward magnitude, resulting in a preference for greater expected value (the product of reward probability and magnitude). These results demonstrate that Markov chains can be used to dynamically assign probabilities that are rapidly exploited by human subjects during saccade target selection.
马尔可夫链(一种概率基于前一个结果分配的随机过程)常用于研究行为状态之间的转换,例如觅食或社交互动中发生的状态转换。然而,关于灵长类动物如何将马尔可夫链的知识融入其行为,人们知之甚少。眼跳是一种受概率信息影响的简单行为,因此是测试受试者是否能够学习马尔可夫链的良好候选行为。此外,在研究概率对眼跳目标选择的影响时,使用马尔可夫链可以提供一种避免其他任务设计中存在的混淆的替代方法。为了研究这些可能性,我们评估了人类在一项任务中的行为,该任务使用马尔可夫链为刺激奖励概率分配。在每次试验中,受试者通过眼跳选择四个相同刺激中的一个;选择后,反馈指示奖励刺激。每个阶段由 200-600 次试验组成,在某些阶段,奖励幅度会发生变化。在具有统一奖励的阶段,受试者(n=6)学会以接近奖励概率的频率选择刺激,这类似于人类在匹配或概率分类任务中的行为。当被告知马尔可夫链分配奖励概率时,受试者(n=3)学会更频繁地选择最大奖励概率,使他们的行为接近最大化奖励的行为。在奖励幅度随刺激变化的阶段,受试者(n=6)表现出对更大奖励概率和更大奖励幅度的偏好,导致对更大预期价值(奖励概率和幅度的乘积)的偏好。这些结果表明,马尔可夫链可用于动态分配概率,人类受试者在眼跳目标选择期间可以迅速利用这些概率。