Microelectronic Systems Design Research Group, Department of Electrical and Computer Engineering, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Germany.
German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany.
Sensors (Basel). 2020 May 16;20(10):2828. doi: 10.3390/s20102828.
The estimation of human hand pose has become the basis for many vital applications where the user depends mainly on the hand pose as a system input. Virtual reality (VR) headset, shadow dexterous hand and in-air signature verification are a few examples of applications that require to track the hand movements in real-time. The state-of-the-art 3D hand pose estimation methods are based on the Convolutional Neural Network (CNN). These methods are implemented on Graphics Processing Units (GPUs) mainly due to their extensive computational requirements. However, GPUs are not suitable for the practical application scenarios, where the low power consumption is crucial. Furthermore, the difficulty of embedding a bulky GPU into a small device prevents the portability of such applications on mobile devices. The goal of this work is to provide an energy efficient solution for an existing depth camera based hand pose estimation algorithm. First, we compress the deep neural network model by applying the dynamic quantization techniques on different layers to achieve maximum compression without compromising accuracy. Afterwards, we design a custom hardware architecture. For our device we selected the FPGA as a target platform because FPGAs provide high energy efficiency and can be integrated in portable devices. Our solution implemented on Xilinx UltraScale+ MPSoC FPGA is 4.2× faster and 577.3× more energy efficient than the original implementation of the hand pose estimation algorithm on NVIDIA GeForce GTX 1070.
人手姿势估计已成为许多重要应用的基础,这些应用主要依赖人手姿势作为系统输入。虚拟现实(VR)耳机、影子灵巧手和空中签名验证是需要实时跟踪手部运动的应用示例。最先进的 3D 人手姿势估计方法基于卷积神经网络(CNN)。这些方法主要在图形处理单元(GPU)上实现,这主要是由于其广泛的计算要求。然而,GPU 不适合实际应用场景,在这些场景中,低功耗至关重要。此外,将笨重的 GPU 嵌入到小型设备中的困难限制了此类应用在移动设备上的便携性。这项工作的目标是为现有的人手姿势估计算法提供一个节能的解决方案。首先,我们通过在不同层应用动态量化技术来压缩深度神经网络模型,以在不牺牲准确性的情况下实现最大压缩。之后,我们设计了一个定制的硬件架构。对于我们的设备,我们选择 FPGA 作为目标平台,因为 FPGA 提供了高的能效,并且可以集成到便携式设备中。我们在 Xilinx UltraScale+MPSoC FPGA 上实现的解决方案比 NVIDIA GeForce GTX 1070 上原始的人手姿势估计算法实现快 4.2 倍,能效高 577.3 倍。