Sakai Yutaka, Fukai Tomoki
Brain Science Institute, Tamagawa University, Machida, Tokyo, Japan.
PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.
What kind of strategies subjects follow in various behavioral circumstances has been a central issue in decision making. In particular, which behavioral strategy, maximizing or matching, is more fundamental to animal's decision behavior has been a matter of debate. Here, we prove that any algorithm to achieve the stationary condition for maximizing the average reward should lead to matching when it ignores the dependence of the expected outcome on subject's past choices. We may term this strategy of partial reward maximization "matching strategy". Then, this strategy is applied to the case where the subject's decision system updates the information for making a decision. Such information includes subject's past actions or sensory stimuli, and the internal storage of this information is often called "state variables". We demonstrate that the matching strategy provides an easy way to maximize reward when combined with the exploration of the state variables that correctly represent the crucial information for reward maximization. Our results reveal for the first time how a strategy to achieve matching behavior is beneficial to reward maximization, achieving a novel insight into the relationship between maximizing and matching.
在各种行为情境中,主体遵循何种策略一直是决策中的核心问题。特别是,哪种行为策略,即最大化策略还是匹配策略,对动物的决策行为更为根本,这一直是一个有争议的问题。在此,我们证明,任何旨在实现平均奖励最大化的平稳条件的算法,当它忽略预期结果对主体过去选择的依赖性时,都应导致匹配。我们可以将这种部分奖励最大化策略称为“匹配策略”。然后,将该策略应用于主体决策系统更新决策信息的情况。此类信息包括主体过去的行动或感官刺激,而这种信息的内部存储通常称为“状态变量”。我们证明,当与正确表示奖励最大化关键信息的状态变量探索相结合时,匹配策略提供了一种简单的奖励最大化方法。我们的结果首次揭示了实现匹配行为的策略如何有利于奖励最大化,为最大化与匹配之间的关系提供了新的见解。