Homma Shogo, Takezawa Masanori
Department of Behavioral Science, Graduate School of Humanities and Human Sciences, Hokkaido University, Sapporo, Hokkaido, Japan.
Japan Society for the Promotion of Science, Tokyo, Japan.
PLoS One. 2024 Aug 1;19(8):e0307991. doi: 10.1371/journal.pone.0307991. eCollection 2024.
The optimization of cognitive and learning mechanisms can reveal complicated behavioral phenomena. In this study, we focused on reinforcement learning, which uses different learning rules for positive and negative reward prediction errors. We attempted to relate the evolved learning bias to the complex features of risk preference such as domain-specific behavior manifests and the relatively stable domain-general factor underlying behaviors. The simulations of the evolution of the two learning rates under diverse risky environments showed that the positive learning rate evolved on average to be higher than the negative one, when agents experienced both tasks where risk aversion was more rewarding and risk seeking was more rewarding. This evolution enabled agents to flexibly choose more reward behaviors depending on the task type. The evolved agents also demonstrated behavioral patterns described by the prospect theory. Our simulations captured two aspects of the evolution of risk preference: the domain-specific aspect, behavior acquired through learning in a specific context; and the implicit domain-general aspect, corresponding to the learning rates shaped through evolution to adaptively behave in a wide range of environments. These results imply that our framework of learning under the innate constraint may be useful in understanding the complicated behavioral phenomena.
认知和学习机制的优化能够揭示复杂的行为现象。在本研究中,我们聚焦于强化学习,它针对正向和负向奖励预测误差采用不同的学习规则。我们试图将进化出的学习偏差与风险偏好的复杂特征联系起来,比如特定领域的行为表现以及行为背后相对稳定的领域通用因素。在不同风险环境下对两种学习率进化的模拟表明,当智能体经历了风险规避更具回报性和风险寻求更具回报性这两种任务时,正向学习率平均进化得高于负向学习率。这种进化使智能体能够根据任务类型灵活地选择更多有回报的行为。进化后的智能体还展现出前景理论所描述的行为模式。我们的模拟捕捉到了风险偏好进化的两个方面:特定领域方面,即在特定情境中通过学习获得的行为;以及隐含的领域通用方面,对应于通过进化形成的学习率,以便在广泛的环境中适应性地行动。这些结果意味着我们在先天约束下的学习框架可能有助于理解复杂的行为现象。