School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China.
Key Laboratory of On-Chip Communication and Sensor Chip of Guangdong Higher Education Institutes, Guangzhou, 510006, China.
Sci Rep. 2023 Oct 21;13(1):17996. doi: 10.1038/s41598-023-45149-5.
Radar-based human activity recognition (HAR) offers a non-contact technique with privacy protection and lighting robustness for many advanced applications. Complex deep neural networks demonstrate significant performance advantages when classifying the radar micro-Doppler signals that have unique correspondences with human behavior. However, in embedded applications, the demand for lightweight and low latency poses challenges to the radar-based HAR network construction. In this paper, an efficient network based on a lightweight hybrid Vision Transformer (LH-ViT) is proposed to address the HAR accuracy and network lightweight simultaneously. This network combines the efficient convolution operations with the strength of the self-attention mechanism in ViT. Feature Pyramid architecture is applied for the multi-scale feature extraction for the micro-Doppler map. Feature enhancement is executed by the stacked Radar-ViT subsequently, in which the fold and unfold operations are added to lower the computational load of the attention mechanism. The convolution operator in the LH-ViT is replaced by the RES-SE block, an efficient structure that combines the residual learning framework with the Squeeze-and-Excitation network. Experiments based on two human activity datasets indicate our method's advantages in terms of expressiveness and computing efficiency over traditional methods.
基于雷达的人体活动识别 (HAR) 为许多高级应用提供了一种具有隐私保护和光照鲁棒性的非接触技术。复杂的深度神经网络在对与人类行为具有独特对应关系的雷达微多普勒信号进行分类时表现出显著的性能优势。然而,在嵌入式应用中,对轻量级和低延迟的需求对基于雷达的 HAR 网络构建提出了挑战。在本文中,提出了一种基于轻量级混合 Vision Transformer (LH-ViT) 的高效网络,以同时解决 HAR 准确性和网络轻量级化的问题。该网络结合了高效的卷积运算和 ViT 中自注意力机制的优势。特征金字塔架构用于微多普勒图的多尺度特征提取。通过堆叠的 Radar-ViT 进行特征增强,其中添加了折叠和展开操作以降低注意力机制的计算负荷。LH-ViT 中的卷积算子被 RES-SE 块取代,这是一种将残差学习框架与挤压激励网络相结合的高效结构。基于两个人体活动数据集的实验表明,与传统方法相比,我们的方法在表达能力和计算效率方面具有优势。