基于卷积神经网络的基于自我中心视图的空中手写指尖检测。

Egocentric-View Fingertip Detection for Air Writing Based on Convolutional Neural Networks.

机构信息

Department of Computer Science and Information Engineering, National Central University, Taoyuan City 32001, Taiwan.

Department of Electrical Engineering, National Taipei University of Technology, Taipei 10608, Taiwan.

出版信息

Sensors (Basel). 2021 Jun 26;21(13):4382. doi: 10.3390/s21134382.

DOI:10.3390/s21134382

PMID:34206768

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8272142/

Abstract

This research investigated real-time fingertip detection in frames captured from the increasingly popular wearable device, smart glasses. The egocentric-view fingertip detection and character recognition can be used to create a novel way of inputting texts. We first employed Unity3D to build a synthetic dataset with pointing gestures from the first-person perspective. The obvious benefits of using synthetic data are that they eliminate the need for time-consuming and error-prone manual labeling and they provide a large and high-quality dataset for a wide range of purposes. Following that, a modified Mask Regional Convolutional Neural Network (Mask R-CNN) is proposed, consisting of a region-based CNN for finger detection and a three-layer CNN for fingertip location. The process can be completed in 25 ms per frame for 640×480 RGB images, with an average error of 8.3 pixels. The speed is high enough to enable real-time "air-writing", where users are able to write characters in the air to input texts or commands while wearing smart glasses. The characters can be recognized by a ResNet-based CNN from the fingertip trajectories. Experimental results demonstrate the feasibility of this novel methodology.

摘要

本研究旨在探索从日益流行的可穿戴设备——智能眼镜所捕获的帧中实时指尖检测。基于自我中心视角的指尖检测和字符识别可以创建一种新颖的文本输入方式。我们首先使用 Unity3D 构建了一个具有指向手势的合成数据集，这些手势来自第一人称视角。使用合成数据的明显优势在于，它们消除了耗时且容易出错的手动标记的需要，并且为各种目的提供了大量高质量的数据集。在此基础上，提出了一种改进的 Mask 区域卷积神经网络（Mask R-CNN），它由用于手指检测的基于区域的 CNN 和用于指尖位置的三层 CNN 组成。对于 640×480 RGB 图像，该过程可以在每帧 25 毫秒内完成，平均误差为 8.3 像素。速度足够快，可以实现实时“空中书写”，用户在佩戴智能眼镜时可以在空中书写字符来输入文本或命令。字符可以通过基于 ResNet 的 CNN 从指尖轨迹中识别。实验结果证明了这种新方法的可行性。