Chaturvedi Pratik, Dutt Varun
Applied Cognitive Science Laboratory, Indian Institute of Technology Mandi, Mandi, India.
Defence Terrain Research Laboratory, Defence Research and Development Organization, New Delhi, India.
Front Psychol. 2021 Feb 10;11:499422. doi: 10.3389/fpsyg.2020.499422. eCollection 2020.
Prior research has used an Interactive Landslide Simulator (ILS) tool to investigate human decision making against landslide risks. It has been found that repeated feedback in the ILS tool about damages due to landslides causes an improvement in human decisions against landslide risks. However, little is known on how theories of learning from feedback (e.g., reinforcement learning) would account for human decisions in the ILS tool. The primary goal of this paper is to account for human decisions in the ILS tool via computational models based upon reinforcement learning and to explore the model mechanisms involved when people make decisions in the ILS tool. Four different reinforcement-learning models were developed and evaluated in their ability to capture human decisions in an experiment involving two conditions in the ILS tool. The parameters of an Expectancy-Valence (EV) model, two Prospect-Valence-Learning models (PVL and PVL-2), a combination EV-PU model, and a random model were calibrated to human decisions in the ILS tool across the two conditions. Later, different models with their calibrated parameters were generalized to data collected in an experiment involving a new condition in ILS. When generalized to this new condition, the PVL-2 model's parameters of both damage-feedback conditions outperformed all other RL models (including the random model). We highlight the implications of our results for decision making against landslide risks.
先前的研究使用了交互式滑坡模拟器(ILS)工具来调查人们针对滑坡风险所做的决策。研究发现,ILS工具中关于滑坡造成的损害的反复反馈会使人们针对滑坡风险所做的决策得到改善。然而,对于从反馈中学习的理论(例如强化学习)如何解释人们在ILS工具中的决策,我们知之甚少。本文的主要目标是通过基于强化学习的计算模型来解释人们在ILS工具中的决策,并探索人们在ILS工具中做出决策时所涉及的模型机制。在一项涉及ILS工具两种情况的实验中,开发并评估了四种不同的强化学习模型捕捉人类决策的能力。将期望效价(EV)模型、两种前景效价学习模型(PVL和PVL-2)、组合的EV-PU模型以及随机模型的参数校准为ILS工具中两种情况下的人类决策。之后,将具有校准参数的不同模型推广到在涉及ILS新情况的实验中收集的数据。当推广到这种新情况时,PVL-2模型在两种损害反馈情况下的参数均优于所有其他强化学习模型(包括随机模型)。我们强调了我们的结果对针对滑坡风险进行决策的意义。