Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela 769008, Odisha, India.
Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Krakow, Poland.
Sensors (Basel). 2022 Jan 18;22(3):706. doi: 10.3390/s22030706.
Hand gesture recognition is one of the most effective modes of interaction between humans and computers due to being highly flexible and user-friendly. A real-time hand gesture recognition system should aim to develop a user-independent interface with high recognition performance. Nowadays, convolutional neural networks (CNNs) show high recognition rates in image classification problems. Due to the unavailability of large labeled image samples in static hand gesture images, it is a challenging task to train deep CNN networks such as AlexNet, VGG-16 and ResNet from scratch. Therefore, inspired by CNN performance, an end-to-end fine-tuning method of a pre-trained CNN model with score-level fusion technique is proposed here to recognize hand gestures in a dataset with a low number of gesture images. The effectiveness of the proposed technique is evaluated using leave-one-subject-out cross-validation (LOO CV) and regular CV tests on two benchmark datasets. A real-time American sign language (ASL) recognition system is developed and tested using the proposed technique.
手势识别是人与计算机之间最有效的交互方式之一,因为它具有高度的灵活性和用户友好性。实时手势识别系统应该旨在开发具有高识别性能的用户独立接口。如今,卷积神经网络 (CNN) 在图像分类问题中表现出很高的识别率。由于静态手势图像中缺乏大量标记的图像样本,因此从零开始训练深度 CNN 网络(如 AlexNet、VGG-16 和 ResNet)是一项具有挑战性的任务。因此,受 CNN 性能的启发,本文提出了一种基于分数级融合技术的预训练 CNN 模型的端到端微调方法,用于识别小样本手势图像数据集的手势。使用两个基准数据集的留一受试者交叉验证 (LOO CV) 和常规 CV 测试评估了所提出技术的有效性。使用所提出的技术开发并测试了实时美国手语 (ASL) 识别系统。