Neurosciences Graduate Training Program, Stanford University, Stanford, CA 94305, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, CA, 94720, USA; Department of Psychology, Arizona State University, Tempe, AZ 85287, USA.
Department of Psychology, Arizona State University, Tempe, AZ 85287, USA.
J Neurosci Methods. 2019 Apr 1;317:37-44. doi: 10.1016/j.jneumeth.2019.01.006. Epub 2019 Jan 18.
Reinforcement learning models provide excellent descriptions of learning in multiple species across a variety of tasks. Many researchers are interested in relating parameters of reinforcement learning models to neural measures, psychological variables or experimental manipulations. We demonstrate that parameter identification is difficult because a range of parameter values provide approximately equal quality fits to data. This identification problem has a large impact on power: we show that a researcher who wants to detect a medium sized correlation (r = .3) with 80% power between a variable and learning rate must collect 60% more subjects than specified by a typical power analysis in order to account for the noise introduced by model fitting.
We derive a Bayesian optimal model fitting technique that takes advantage of information contained in choices and reaction times to constrain parameter estimates.
We show using simulation and empirical data that this method substantially improves the ability to recover learning rates.
We compare this method against the use of Bayesian priors. We show in simulations that the combined use of Bayesian priors and reaction times confers the highest parameter identifiability. However, in real data where the priors may have been misspecified, the use of Bayesian priors interferes with the ability of reaction time data to improve parameter identifiability.
We present a simple technique that takes advantage of readily available data to substantially improve the quality of inferences that can be drawn from parameters of reinforcement learning models.
强化学习模型为多种任务中的多种物种的学习提供了出色的描述。许多研究人员都有兴趣将强化学习模型的参数与神经测量、心理变量或实验操作联系起来。我们证明了参数识别是困难的,因为一系列参数值可以提供几乎相同质量的数据拟合。这个识别问题对功率有很大的影响:我们表明,希望检测变量与学习率之间具有中等相关(r=0.3)的研究人员必须收集比典型功率分析指定的多 60%的样本,以考虑到模型拟合引入的噪声。
我们推导出一种贝叶斯最优模型拟合技术,该技术利用选择和反应时间中的信息来约束参数估计。
我们通过模拟和经验数据表明,该方法大大提高了恢复学习率的能力。
我们将这种方法与贝叶斯先验的使用进行了比较。我们在模拟中表明,贝叶斯先验和反应时间的结合使用赋予了最高的参数可识别性。然而,在可能存在先验误置的真实数据中,贝叶斯先验的使用会干扰反应时间数据提高参数可识别性的能力。
我们提出了一种简单的技术,利用现成的数据,大大提高了从强化学习模型参数中得出推论的质量。