College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
College of Computer Science, Chongqing University, Chongqing, 400044, China.
Sci Rep. 2022 Mar 14;12(1):4345. doi: 10.1038/s41598-022-08133-z.
Gesture recognition is one of the most popular techniques in the field of computer vision today. In recent years, many algorithms for gesture recognition have been proposed, but most of them do not have a good balance between recognition efficiency and accuracy. Therefore, proposing a dynamic gesture recognition algorithm that balances efficiency and accuracy is still a meaningful work. Currently, most of the commonly used dynamic gesture recognition algorithms are based on 3D convolutional neural networks. Although 3D convolutional neural networks consider both spatial and temporal features, the networks are too complex, which is the main reason for the low efficiency of the algorithms. To improve this problem, we propose a recognition method based on a strategy combining 2D convolutional neural networks with feature fusion. The original keyframes and optical flow keyframes are used to represent spatial and temporal features respectively, which are then sent to the 2D convolutional neural network for feature fusion and final recognition. To ensure the quality of the extracted optical flow graph without increasing the complexity of the network, we use the fractional-order method to extract the optical flow graph, creatively combine fractional calculus and deep learning. Finally, we use Cambridge Hand Gesture dataset and Northwestern University Hand Gesture dataset to verify the effectiveness of our algorithm. The experimental results show that our algorithm has a high accuracy while ensuring low network complexity.
手势识别是计算机视觉领域中最流行的技术之一。近年来,已经提出了许多手势识别算法,但大多数算法在识别效率和准确性之间都没有很好的平衡。因此,提出一种平衡效率和准确性的动态手势识别算法仍然是一项有意义的工作。目前,大多数常用的动态手势识别算法都是基于 3D 卷积神经网络的。虽然 3D 卷积神经网络同时考虑了空间和时间特征,但网络过于复杂,这是算法效率低的主要原因。为了解决这个问题,我们提出了一种基于 2D 卷积神经网络与特征融合相结合的策略的识别方法。原始关键帧和光流关键帧分别表示空间和时间特征,然后将它们发送到 2D 卷积神经网络进行特征融合和最终识别。为了确保提取的光流图的质量而不增加网络的复杂性,我们使用分数阶方法提取光流图,创造性地将分数阶微积分和深度学习结合起来。最后,我们使用剑桥手部动作数据集和西北大学手部动作数据集验证了我们算法的有效性。实验结果表明,我们的算法在保证网络复杂度低的同时具有很高的准确性。