Damiani Francesco, Anzai Akiyuki, Drugowitsch Jan, DeAngelis Gregory C, Moreno-Bote Rubén
Center for Brain and Cognition, Department of Engineering, Pompeu Fabra University, Barcelona, ES.
Department of Brain and Cognitive Sciences, University of Rochester, Rochester, USA.
Adv Neural Inf Process Syst. 2024;37:123291-123327.
A pivotal brain computation relies on the ability to sustain perception-action loops. Stochastic optimal control theory offers a mathematical framework to explain these processes at the algorithmic level through optimality principles. However, incorporating a realistic noise model of the sensorimotor system - accounting for multiplicative noise in feedback and motor output, as well as internal noise in estimation - makes the problem challenging. Currently, the algorithm that is commonly used is the one proposed in the seminal study in [1]. After discovering some pitfalls in the original derivation, i.e., unbiased estimation does not hold, we improve the algorithm by proposing an efficient gradient descent-based optimization that minimizes the cost-to-go while only imposing linearity of the control law. The optimal solution is obtained by iteratively propagating in closed form the sufficient statistics to compute the expected cost and then minimizing this cost with respect to the filter and control gains. We demonstrate that this approach results in a significantly lower overall cost than current state-of-the-art solutions, particularly in the presence of internal noise, though the improvement is present in other circumstances as well, with theoretical explanations for this enhanced performance. Providing the optimal control law is key for inverse control inference, especially in explaining behavioral data under rationality assumptions.
一个关键的大脑计算依赖于维持感知 - 行动循环的能力。随机最优控制理论提供了一个数学框架,通过最优性原理在算法层面解释这些过程。然而,纳入感觉运动系统的现实噪声模型——考虑反馈和运动输出中的乘性噪声以及估计中的内部噪声——使得这个问题具有挑战性。目前,常用的算法是[1]中开创性研究提出的算法。在发现原始推导中的一些缺陷,即无偏估计不成立后,我们通过提出一种基于梯度下降的高效优化方法来改进该算法,该方法在仅施加控制律的线性条件下最小化代价函数。通过以封闭形式迭代传播充分统计量来计算预期代价,然后相对于滤波器和控制增益最小化该代价,从而获得最优解。我们证明,这种方法导致的总体代价明显低于当前的最优解决方案,特别是在存在内部噪声的情况下,尽管在其他情况下也有改进,并对这种性能提升给出了理论解释。提供最优控制律对于逆控制推理至关重要,特别是在理性假设下解释行为数据时。