Günther Johannes, Ady Nadia M, Kearney Alex, Dawson Michael R, Pilarski Patrick M
Department of Computing Science, University of Alberta, Edmonton, AB, Canada.
Alberta Machine Intelligence Institute, Edmonton, AB, Canada.
Front Robot AI. 2020 Mar 13;7:34. doi: 10.3389/frobt.2020.00034. eCollection 2020.
Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well-suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the or ). Typically, these parameters are chosen based on an extensive parameter search-an approach that neither scales well nor is well-suited for tasks that require changing step sizes due to non-stationarity. To begin to address this challenge, we examine the use of online step-size adaptation using the Modular Prosthetic Limb: a sensor-rich robotic arm intended for use by persons with amputations. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. As a first contribution, we show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream, but TD only achieves comparable performance with a carefully hand-tuned learning rate, while TIDBD uses a robust meta-parameter and tunes its own learning rates. Secondly, our results show that for this particular application TIDBD allows the system to automatically detect patterns characteristic of sensor failures common to a number of robotic applications. As a third contribution, we investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.
预测和预测性知识最近不仅在改善机器人控制方面取得了成功,而且在从工业过程控制到康复等其他应用领域也取得了成功。这些预测方法非常适合机器人技术的一个特性是,它们可以通过与环境的交互在线且增量地学习。然而,许多预测学习方法仍然面临的一个挑战是预测学习参数的适当选择,特别是控制学习机器对其预测进行更新的幅度的参数(步长或学习率)。通常,这些参数是基于广泛的参数搜索来选择的——这种方法既没有很好的扩展性,也不适合由于非平稳性而需要改变步长的任务。为了开始应对这一挑战,我们研究了使用模块化假肢手臂进行在线步长自适应:一种供截肢者使用的、传感器丰富的机器人手臂。我们选择的方法,即时间差分增量德尔塔-巴-德尔塔(TIDBD),在特征层面学习并自适应步长;重要的是,TIDBD允许步长调整和表示学习同时进行。作为第一个贡献,我们通过广泛的参数搜索表明,TIDBD是经典时间差分(TD)学习的一种实用替代方法。在预测机器人数据流的未来方面,这两种方法的表现相当,但TD只有在经过精心手动调整学习率的情况下才能达到可比的性能,而TIDBD使用一个稳健的元参数并自行调整学习率。其次,我们的结果表明,对于这个特定应用,TIDBD允许系统自动检测许多机器人应用中常见的传感器故障特征模式。作为第三个贡献,我们研究了经典TD和TIDBD在我们的机器人数据集上对初始步长值的敏感性,再次证实了之前论文中所示的TIDBD的稳健性。总之,这些结果有望提高机器人设备以稳健方式从与环境的交互中学习的能力,为自主智能体和机器人提供关键能力。