Arapi Visar, Della Santina Cosimo, Bacciu Davide, Bianchi Matteo, Bicchi Antonio
Centro di Ricerca "Enrico Piaggio," Università di Pisa, Pisa, Italy.
Dipartimento di Ingegneria dell'Informazione, Università di Pisa, Pisa, Italy.
Front Neurorobot. 2018 Dec 17;12:86. doi: 10.3389/fnbot.2018.00086. eCollection 2018.
Humans are capable of complex manipulation interactions with the environment, relying on the intrinsic adaptability and compliance of their hands. Recently, soft robotic manipulation has attempted to reproduce such an extraordinary behavior, through the design of deformable yet robust end-effectors. To this goal, the investigation of human behavior has become crucial to correctly inform technological developments of robotic hands that can successfully exploit environmental constraint as humans actually do. Among the different tools robotics can leverage on to achieve this objective, deep learning has emerged as a promising approach for the study and then the implementation of neuro-scientific observations on the artificial side. However, current approaches tend to neglect the dynamic nature of hand pose recognition problems, limiting the effectiveness of these techniques in identifying sequences of manipulation primitives underpinning action generation, e.g., during purposeful interaction with the environment. In this work, we propose a vision-based supervised Hand Pose Recognition method which, for the first time, takes into account temporal information to identify meaningful sequences of actions in grasping and manipulation tasks. More specifically, we apply Deep Neural Networks to automatically learn features from hand posture images that consist of frames extracted from grasping and manipulation task videos with objects and external environmental constraints. For training purposes, videos are divided into intervals, each associated to a specific action by a human supervisor. The proposed algorithm combines a Convolutional Neural Network to detect the hand within each video frame and a Recurrent Neural Network to predict the hand action in the current frame, while taking into consideration the history of actions performed in the previous frames. Experimental validation has been performed on two datasets of dynamic hand-centric strategies, where subjects regularly interact with objects and environment. Proposed architecture achieved a very good classification accuracy on both datasets, reaching performance up to 94%, and outperforming state of the art techniques. The outcomes of this study can be successfully applied to robotics, e.g., for planning and control of soft anthropomorphic manipulators.
人类能够凭借其手部固有的适应性和柔顺性与环境进行复杂的操纵互动。最近,软机器人操纵试图通过设计可变形但坚固的末端执行器来重现这种非凡的行为。为了实现这一目标,对人类行为的研究对于正确指导机器人手的技术发展至关重要,这种机器人手能够像人类实际那样成功地利用环境约束。在机器人学可用于实现这一目标的不同工具中,深度学习已成为一种有前途的方法,用于在人工层面研究并随后实施神经科学观察结果。然而,当前的方法往往忽视了手部姿态识别问题的动态性质,限制了这些技术在识别支撑动作生成的操纵原语序列方面的有效性,例如在与环境进行有目的互动期间。在这项工作中,我们提出了一种基于视觉的监督式手部姿态识别方法,该方法首次考虑了时间信息,以识别抓取和操纵任务中有意义的动作序列。更具体地说,我们应用深度神经网络从手部姿态图像中自动学习特征,这些图像由从带有物体和外部环境约束的抓取和操纵任务视频中提取的帧组成。为了进行训练,视频被划分为多个区间,每个区间由人工监督者与一个特定动作相关联。所提出的算法结合了一个卷积神经网络来检测每个视频帧内的手部,以及一个循环神经网络来预测当前帧中的手部动作,同时考虑前一帧中执行的动作历史。我们在两个以手部为中心的动态策略数据集上进行了实验验证,在这些数据集中,受试者定期与物体和环境进行互动。所提出的架构在两个数据集上都取得了非常好的分类准确率,性能高达94%,并且优于现有技术。这项研究的成果可以成功应用于机器人学,例如用于软拟人操纵器的规划和控制。