Department of Microwave Engineering, Harbin Institute of Technology, Harbin 150001, China.
Sensors (Basel). 2021 Jan 2;21(1):259. doi: 10.3390/s21010259.
The purpose of this paper was to investigate the effect of a training state-of-the-art convolution neural network (CNN) for millimeter-wave radar-based hand gesture recognition (MR-HGR). Focusing on the small training dataset problem in MR-HGR, this paper first proposed to transfer the knowledge with the CNN models in computer vision to MR-HGR by fine-tuning the models with radar data samples. Meanwhile, for the different data modality in MR-HGR, a parameterized representation of temporal space-velocity (TSV) spectrogram was proposed as an integrated data modality of the time-evolving hand gesture features in the radar echo signals. The TSV spectrograms representing six common gestures in human-computer interaction (HCI) from nine volunteers were used as the data samples in the experiment. The evaluated models included ResNet with 50, 101, and 152 layers, DenseNet with 121, 161 and 169 layers, as well as light-weight MobileNet V2 and ShuffleNet V2, mostly proposed by many latest publications. In the experiment, not only self-testing (ST), but also more persuasive cross-testing (CT), were implemented to evaluate whether the fine-tuned models generalize to the radar data samples. The CT results show that the best fine-tuned models can reach to an average accuracy higher than 93% with a comparable ST average accuracy almost 100%. Moreover, in order to alleviate the problem caused by private gesture habits, an auxiliary test was performed by augmenting four shots of the gestures with the heaviest misclassifications into the training set. This enriching test is similar with the scenario that a tablet reacts to a new user. The results of two different volunteer in the enriching test shows that the average accuracy of the enriched gesture can be improved from 55.59% and 65.58% to 90.66% and 95.95% respectively. Compared with some baseline work in MR-HGR, the investigation by this paper can be beneficial in promoting MR-HGR in future industry applications and consumer electronic design.
本文旨在研究基于毫米波雷达的手势识别(MR-HGR)中使用先进的卷积神经网络(CNN)训练的效果。针对 MR-HGR 中训练数据集较小的问题,本文首先提出通过使用雷达数据样本微调模型,将计算机视觉中的 CNN 模型知识转移到 MR-HGR 中。同时,针对 MR-HGR 中不同的数据模态,提出了一种时变空间-速度(TSV)频谱图的参数化表示方法,作为雷达回波信号中时变手势特征的综合数据模态。实验中使用了来自 9 名志愿者的 6 种常见人机交互(HCI)手势的 TSV 频谱图作为数据样本。评估的模型包括具有 50、101 和 152 层的 ResNet、具有 121、161 和 169 层的 DenseNet,以及轻量级的 MobileNet V2 和 ShuffleNet V2,这些模型大多是由最新的出版物提出的。在实验中,不仅进行了自我测试(ST),还进行了更有说服力的交叉测试(CT),以评估微调后的模型是否可以推广到雷达数据样本。CT 结果表明,最好的微调模型的平均准确率可以达到 93%以上,与几乎 100%的 ST 平均准确率相当。此外,为了缓解由于个人手势习惯造成的问题,通过将分类错误最严重的四个手势添加到训练集中,进行了辅助测试。这种丰富的测试类似于平板电脑对新用户的反应场景。在丰富测试中,两名不同志愿者的结果表明,丰富手势的平均准确率可以从 55.59%和 65.58%分别提高到 90.66%和 95.95%。与 MR-HGR 中的一些基线工作相比,本文的研究有助于促进未来工业应用和消费电子设计中的 MR-HGR 发展。