闭环行为系统中的各向同性序列顺序学习

Isotropic-sequence-order learning in a closed-loop behavioural system.

作者信息

Porr Bernd, Wörgötter Florentin

机构信息

Department of Psychology, University of Stirling, Stirling FK9 4LA, UK.

出版信息

Philos Trans A Math Phys Eng Sci. 2003 Oct 15;361(1811):2225-44. doi: 10.1098/rsta.2003.1273.

DOI:10.1098/rsta.2003.1273

PMID:14599317

Abstract

The simplest form of sensor-motor control is obtained with a reflex. In this case the reflex can be interpreted as part of a closed-loop control paradigm which measures a sensor input and generates a motor reaction as soon as the sensor signal deviates from its desired (resting) state. This is a typical case of feedback control. However, reflex reactions are tardy, because they occur always only after a (for example, unpleasant) reflex-eliciting sensor event. This defines an objective problem for an organism which can only be avoided if the corresponding motor reaction is generated earlier. The goal of this study is to design a closed-loop control situation where temporal-sequence learning supersedes a tardy reflex reaction with a proactive anticipatory action. We achieve this by employing a second, earlier-occurring and causally coupled sensor event. An appropriate motor reaction to this early event prevents triggering of the original, primary reflex. Such causally coupled sensor events are common for animals, for example when smell predicts taste or when heat radiation precedes pain. We show that trying to achieve anticipatory control is a fundamentally different goal from trying to model a classical conditioning paradigm, which is an open-loop condition. To this end, we use a novel learning rule for temporal-sequence learning called isotropic-sequence-order (ISO) learning, which performs a confounded correlation between the primary sensor signal associated to the reflex and a predictive, earlier-occurring sensor input: this way the system learns the relation between the primary reflex and the earlier sensor input in order to create an earlier-occurring motor reaction. As a consequence of learning, the primary reflex will not be triggered any more, thereby permanently remaining in its desired resting state. In a robot application, we demonstrate that ISO learning can successfully solve the classical obstacle-avoidance task by learning to correlate a built-in reflex behaviour (retraction after touching) with earlier arising signals from range finders (before touching). Finally, we show that avoidance and attraction tasks can be combined in the same agent.

摘要

最简单的感觉运动控制形式是通过反射实现的。在这种情况下，反射可被解释为闭环控制范式的一部分，该范式测量感觉输入，并在感觉信号偏离其期望（静止）状态时立即产生运动反应。这是反馈控制的典型例子。然而，反射反应是迟缓的，因为它们总是在（例如，不愉快的）引发反射的感觉事件之后才发生。这给生物体带来了一个客观问题，只有在更早地产生相应的运动反应时才能避免。本研究的目标是设计一种闭环控制情境，其中时间序列学习通过主动的预期行动取代迟缓的反射反应。我们通过采用第二个更早发生且因果相关的感觉事件来实现这一点。对这个早期事件的适当运动反应可防止触发原始的主要反射。这种因果相关的感觉事件在动物中很常见，例如当气味预示味道时，或者当热辐射先于疼痛出现时。我们表明，试图实现预期控制与试图模拟经典条件作用范式（一种开环条件）是根本不同的目标。为此，我们使用一种用于时间序列学习的新颖学习规则，称为各向同性序列顺序（ISO）学习，它在与反射相关的主要感觉信号和预测性的更早出现的感觉输入之间进行混淆相关：通过这种方式，系统学习主要反射与更早感觉输入之间的关系，以便创建更早出现的运动反应。学习的结果是，主要反射将不再被触发，从而永久保持在其期望的静止状态。在机器人应用中，我们证明ISO学习可以通过学习将内置的反射行为（触摸后缩回）与距离传感器更早出现的信号（触摸前）相关联，成功解决经典的避障任务。最后，我们表明避障和吸引任务可以在同一个智能体中结合。