通过强化学习为基于传感器的操纵器制定到达策略。

Learning reaching strategies through reinforcement for a sensor-based manipulator.

作者信息

Martín P, Millán J del R

机构信息

Department of Computer Science, University of Jaume I, Campus de Penyeta Roja, 12071 Castellón, Spain.

出版信息

Neural Netw. 1998 Mar 31;11(2):359-76. doi: 10.1016/s0893-6080(97)00137-8.

DOI:10.1016/s0893-6080(97)00137-8

PMID:12662844

Abstract

This paper presents a neural controller that learns goal-oriented obstacle-avoiding reaction strategies for a multilink robot arm. It acquires these strategies on-line from local sensory data. The controller consists of two neural modules: an actor-critic module and a module for differential inverse kinematics (DIV). The input codification for the controller exploits the inherent symmetry of the robot arm kinematics. The actor-critic module generates actions with regard to the Shortest Path Vector (SPV) to the closest goal in the configuration space. However, the computation of the SPV is cumbersome for manipulators with more than two links. The DIV module aims to overcome the SPV calculation. This module provides a goal vector by means of the inversion of a neural network that has been trained previously to approximate the manipulator forward kinematics. Results for a two-link robot arm show that the combination of both modules speeds up the learning process.

摘要

本文提出了一种神经控制器，该控制器可为多连杆机器人手臂学习面向目标的避障反应策略。它从局部传感数据中在线获取这些策略。该控制器由两个神经模块组成：一个动作-评判模块和一个微分逆运动学（DIV）模块。控制器的输入编码利用了机器人手臂运动学固有的对称性。动作-评判模块根据配置空间中到最近目标的最短路径向量（SPV）生成动作。然而，对于具有两个以上连杆的机械手，SPV的计算很繁琐。DIV模块旨在克服SPV的计算问题。该模块通过一个先前经过训练以近似机械手正向运动学的神经网络的逆运算来提供目标向量。双连杆机器人手臂的结果表明，两个模块的组合加快了学习过程。