Faculty of Engineering Technology, Hung Vuong University, Viet Tri City 35100, Vietnam.
Department of Intelligent Computer Systems, Czestochowa University of Technology, 42-218 Czestochowa, Poland.
Sensors (Basel). 2023 Mar 20;23(6):3255. doi: 10.3390/s23063255.
Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance of the "You Only Live Once" (YOLO) network over the past seven years, we propose a study comparing the efficiency of hand detection and classification based on the YOLO-family networks. This study is based on the following problems: (1) systematizing all architectures, advantages, and disadvantages of YOLO-family networks from version (v)1 to v7; (2) preparing ground-truth data for pre-trained models and evaluation models of hand detection and classification on EV datasets (FPHAB, HOI4D, RehabHand); (3) fine-tuning the hand detection and classification model based on the YOLO-family networks, hand detection, and classification evaluation on the EV datasets. Hand detection and classification results on the YOLOv7 network and its variations were the best across all three datasets. The results of the YOLOv7-w6 network are as follows: FPHAB is = 97% with = 0.5; HOI4D is = 95% with = 0.5; RehabHand is larger than 95% with = 0.5; the processing speed of YOLOv7-w6 is 60 fps with a resolution of 1280 × 1280 pixels and that of YOLOv7 is 133 fps with a resolution of 640 × 640 pixels.
手检测和分类是基于三维(3D)手姿势估计和手活动识别构建应用程序的非常重要的预处理步骤。为了自动限制基于自我中心视觉(EV)数据集的手数据区域,特别是观察过去七年来“只活一次”(YOLO)网络的发展和性能,我们提出了一项比较基于 YOLO 系列网络的手检测和分类效率的研究。本研究基于以下问题:(1)从版本(v)1 到 v7 系统地整理 YOLO 系列网络的所有架构、优点和缺点;(2)为 EV 数据集(FPHAB、HOI4D、RehabHand)上的预训练模型和手检测与分类评估模型准备地面真实数据;(3)基于 YOLO 系列网络、EV 数据集上手检测和分类评估对手检测和分类模型进行微调。在所有三个数据集上,基于 YOLOv7 网络及其变体的手检测和分类结果均最佳。YOLOv7-w6 网络的结果如下:在 FPHAB 上,=97%,=0.5;在 HOI4D 上,=95%,=0.5;在 RehabHand 上,>95%,=0.5;YOLOv7-w6 的处理速度为 60 fps,分辨率为 1280×1280 像素,而 YOLOv7 的处理速度为 133 fps,分辨率为 640×640 像素。