Szita István, Lorincz András
Department of Information Systems, Eötvos Löránd University, Pázmány Péter sétány 1/C, H-1117 Budapest, Hungary.
Neural Comput. 2004 Mar;16(3):491-9. doi: 10.1162/089976604772744884.
There is a growing interest in using Kalman filter models in brain modeling. The question arises whether Kalman filter models can be used on-line not only for estimation but for control. The usual method of optimal control of Kalman filter makes use of off-line backward recursion, which is not satisfactory for this purpose. Here, it is shown that a slight modification of the linear-quadratic-gaussian Kalman filter model allows the on-line estimation of optimal control by using reinforcement learning and overcomes this difficulty. Moreover, the emerging learning rule for value estimation exhibits a Hebbian form, which is weighted by the error of the value estimation.
在脑建模中使用卡尔曼滤波器模型的兴趣与日俱增。问题在于卡尔曼滤波器模型是否不仅可以用于在线估计,还能用于控制。卡尔曼滤波器的常规最优控制方法采用离线反向递推,在此目的上并不令人满意。本文表明,对线性二次高斯卡尔曼滤波器模型进行轻微修改,通过强化学习可实现最优控制的在线估计,从而克服了这一困难。此外,新出现的价值估计学习规则呈现出赫布形式,其权重由价值估计的误差决定。