Haji Fathaliyan Alireza, Wang Xiaoyu, Santos Veronica J
Biomechatronics Laboratory, Mechanical and Aerospace Engineering, University of California, Los Angeles, Los Angeles, CA, United States.
Front Robot AI. 2018 Apr 4;5:25. doi: 10.3389/frobt.2018.00025. eCollection 2018.
Human-robot collaboration could be advanced by facilitating the intuitive, gaze-based control of robots, and enabling robots to recognize human actions, infer human intent, and plan actions that support human goals. Traditionally, gaze tracking approaches to action recognition have relied upon computer vision-based analyses of two-dimensional egocentric camera videos. The objective of this study was to identify useful features that can be extracted from three-dimensional (3D) gaze behavior and used as inputs to machine learning algorithms for human action recognition. We investigated human gaze behavior and gaze-object interactions in 3D during the performance of a bimanual, instrumental activity of daily living: the preparation of a powdered drink. A marker-based motion capture system and binocular eye tracker were used to reconstruct 3D gaze vectors and their intersection with 3D point clouds of objects being manipulated. Statistical analyses of gaze fixation duration and saccade size suggested that some actions (pouring and stirring) may require more visual attention than other actions (reach, pick up, set down, and move). 3D gaze saliency maps, generated with high spatial resolution for six subtasks, appeared to encode action-relevant information. The "gaze object sequence" was used to capture information about the identity of objects in concert with the temporal sequence in which the objects were visually regarded. Dynamic time warping barycentric averaging was used to create a population-based set of characteristic gaze object sequences that accounted for intra- and inter-subject variability. The gaze object sequence was used to demonstrate the feasibility of a simple action recognition algorithm that utilized a dynamic time warping Euclidean distance metric. Averaged over the six subtasks, the action recognition algorithm yielded an accuracy of 96.4%, precision of 89.5%, and recall of 89.2%. This level of performance suggests that the gaze object sequence is a promising feature for action recognition whose impact could be enhanced through the use of sophisticated machine learning classifiers and algorithmic improvements for real-time implementation. Robots capable of robust, real-time recognition of human actions during manipulation tasks could be used to improve quality of life in the home and quality of work in industrial environments.
通过促进对机器人基于注视的直观控制,并使机器人能够识别人类动作、推断人类意图以及规划支持人类目标的动作,人机协作可以得到推进。传统上,用于动作识别的注视跟踪方法依赖于基于计算机视觉对二维自我中心相机视频的分析。本研究的目的是识别可以从三维(3D)注视行为中提取的有用特征,并将其用作机器学习算法进行人类动作识别的输入。我们在一项双手进行的日常生活工具性活动——冲调粉末饮料的过程中,研究了3D空间中的人类注视行为和注视对象交互。基于标记的运动捕捉系统和双目眼动仪被用于重建3D注视向量及其与被操作物体的3D点云的交点。对注视持续时间和扫视大小的统计分析表明,某些动作(倒和搅拌)可能比其他动作(伸手、拿起、放下和移动)需要更多的视觉注意力。为六个子任务生成的具有高空间分辨率的3D注视显著性图似乎编码了与动作相关的信息。“注视对象序列”用于结合物体被视觉关注的时间顺序来捕捉有关物体身份的信息。动态时间规整重心平均法被用于创建一组基于群体的特征注视对象序列,该序列考虑了个体间和个体内的变异性。注视对象序列被用于证明一种利用动态时间规整欧几里得距离度量的简单动作识别算法的可行性。在六个子任务上平均,该动作识别算法的准确率为96.4%,精确率为89.5%,召回率为89.2%。这种性能水平表明,注视对象序列是一种很有前景的动作识别特征,通过使用复杂的机器学习分类器和算法改进以实现实时应用,其影响可能会得到增强。能够在操作任务期间对人类动作进行可靠实时识别的机器人可用于改善家庭生活质量和工业环境中的工作质量。