Zhuang HongChao, Xia YiLu, Wang Ning, Li WeiHua, Dong Lei, Li Bo
School of Mechanical Engineering, Tianjin University of Technology and Education, Tianjin, 300222 China.
School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, 300222 China.
Sci China Technol Sci. 2023;66(6):1717-1733. doi: 10.1007/s11431-022-2368-y. Epub 2023 May 9.
The lightweight human-robot interaction model with high real-time, high accuracy, and strong anti-interference capability can be better applied to future lunar surface exploration and construction work. Based on the feature information inputted from the monocular camera, the signal acquisition and processing fusion of the astronaut gesture and eye-movement modal interaction can be performed. Compared with the single mode, the human-robot interaction model of bimodal collaboration can achieve the issuance of complex interactive commands more efficiently. The optimization of the target detection model is executed by inserting attention into YOLOv4 and filtering image motion blur. The central coordinates of pupils are identified by the neural network to realize the human-robot interaction in the eye movement mode. The fusion between the astronaut gesture signal and eye movement signal is performed at the end of the collaborative model to achieve complex command interactions based on a lightweight model. The dataset used in the network training is enhanced and extended to simulate the realistic lunar space interaction environment. The human-robot interaction effects of complex commands in the single mode are compared with those of complex commands in the bimodal collaboration. The experimental results show that the concatenated interaction model of the astronaut gesture and eye movement signals can excavate the bimodal interaction signal better, discriminate the complex interaction commands more quickly, and has stronger signal anti-interference capability based on its stronger feature information mining ability. Compared with the command interaction realized by using the single gesture modal signal and the single eye movement modal signal, the interaction model of bimodal collaboration is shorter about 79% to 91% of the time under the single mode interaction. Regardless of the influence of any image interference item, the overall judgment accuracy of the proposed model can be maintained at about 83% to 97%. The effectiveness of the proposed method is verified.
这种具有高实时性、高精度和强抗干扰能力的轻量级人机交互模型能够更好地应用于未来的月球表面探测与建设工作。基于单目摄像头输入的特征信息,可以进行宇航员手势和眼动模态交互的信号采集与处理融合。与单模态相比,双模态协作的人机交互模型能够更高效地发出复杂的交互指令。通过在YOLOv4中插入注意力机制并过滤图像运动模糊来执行目标检测模型的优化。利用神经网络识别瞳孔的中心坐标,以实现眼动模式下的人机交互。在协作模型的末尾进行宇航员手势信号与眼动信号的融合,以基于轻量级模型实现复杂的指令交互。对网络训练中使用的数据集进行增强和扩展,以模拟现实的月球空间交互环境。将单模态下复杂指令的人机交互效果与双模态协作下复杂指令的人机交互效果进行比较。实验结果表明,宇航员手势和眼动信号的级联交互模型能够更好地挖掘双模态交互信号,更快地区分复杂的交互指令,并且基于其更强的特征信息挖掘能力具有更强的信号抗干扰能力。与使用单手势模态信号和单眼动模态信号实现的指令交互相比,双模态协作的交互模型在单模态交互下的时间缩短了约79%至91%。无论任何图像干扰项的影响如何,所提模型的整体判断准确率均可保持在约83%至97%。验证了所提方法的有效性。