Sakamoto Kazuhiro, Yamada Hinata, Kawaguchi Norihiko, Furusawa Yoshito, Saito Naohiro, Mushiake Hajime
Department of Neuroscience, Faculty of Medicine, Tohoku Medical and Pharmaceutical University, Sendai, Japan.
Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.
Front Comput Neurosci. 2022 Jun 2;16:784604. doi: 10.3389/fncom.2022.784604. eCollection 2022.
Learning is a crucial basis for biological systems to adapt to environments. Environments include various states or episodes, and episode-dependent learning is essential in adaptation to such complex situations. Here, we developed a model for learning a two-target search task used in primate physiological experiments. In the task, the agent is required to gaze one of the four presented light spots. Two neighboring spots are served as the correct target alternately, and the correct target pair is switched after a certain number of consecutive successes. In order for the agent to obtain rewards with a high probability, it is necessary to make decisions based on the actions and results of the previous two trials. Our previous work achieved this by using a dynamic state space. However, to learn a task that includes events such as fixation to the initial central spot, the model framework should be extended. For this purpose, here we propose a "history-in-episode architecture." Specifically, we divide states into episodes and histories, and actions are selected based on the histories within each episode. When we compared the proposed model including the dynamic state space with the conventional SARSA method in the two-target search task, the former performed close to the theoretical optimum, while the latter never achieved target-pair switch because it had to re-learn each correct target each time. The reinforcement learning model including the proposed history-in-episode architecture and dynamic state scape enables episode-dependent learning and provides a basis for highly adaptable learning systems to complex environments.
学习是生物系统适应环境的关键基础。环境包括各种状态或事件,而依赖于事件的学习对于适应这种复杂情况至关重要。在此,我们开发了一种用于学习灵长类动物生理实验中使用的双目标搜索任务的模型。在该任务中,智能体需要注视呈现的四个亮点之一。两个相邻的点交替作为正确目标,并且在连续成功一定次数后正确目标对会切换。为了使智能体以高概率获得奖励,有必要根据前两次试验的动作和结果做出决策。我们之前的工作通过使用动态状态空间实现了这一点。然而,为了学习包含诸如注视初始中心点等事件的任务,模型框架需要扩展。为此,我们在此提出一种“事件内历史架构”。具体而言,我们将状态分为事件和历史,并且基于每个事件内的历史来选择动作。当我们在双目标搜索任务中将包含动态状态空间的所提出模型与传统的SARSA方法进行比较时,前者的表现接近理论最优,而后者从未实现目标对切换,因为它每次都必须重新学习每个正确目标。包含所提出的事件内历史架构和动态状态空间的强化学习模型实现了依赖于事件的学习,并为高度适应复杂环境的学习系统提供了基础。