Wang Yangang, Zhang Baowen, Peng Cong
IEEE Trans Image Process. 2019 Nov 28. doi: 10.1109/TIP.2019.2955280.
This paper introduces a novel method for real-time 2D hand pose estimation from monocular color images, which is named as SRHandNet. Existing methods can not time efficiently obtain appropriate results for small hand. Our key idea is to simultaneously regress the hand region of interests (RoIs) and hand keypoints for a given color image, and iteratively take the hand RoIs as feedback information for boosting the performance of hand keypoints estimation with a single encoder-decoder network architecture. Different from previous region proposal network (RPN), a new lightweight bounding box representation, which is called region map, is proposed. The proposed bounding box representation map together with hand keypoints heatmaps are combined into the unified multi-channel feature maps, which can be easily acquired with only one forward network inference and thus improve the runtime efficiency of the network. Our proposed SRHandNet can run at 40fps for hand bounding box detection and up to 30fps accurate hand keypoints estimation under the desktop environment without implementation optimization. Experiments demonstrate the effectiveness of the proposed method. State-of-the-art results are also achieved out competing all recent methods.
本文介绍了一种从单目彩色图像中实时估计二维手部姿态的新方法,称为SRHandNet。现有方法无法高效地为小尺寸手部及时获取合适的结果。我们的关键思想是,对于给定的彩色图像,同时回归手部感兴趣区域(RoI)和手部关键点,并使用单个编码器-解码器网络架构,将手部RoI作为反馈信息进行迭代,以提高手部关键点估计的性能。与先前的区域提议网络(RPN)不同,我们提出了一种新的轻量级边界框表示,称为区域图。所提出的边界框表示图与手部关键点热图被组合成统一的多通道特征图,仅通过一次前向网络推理就能轻松获取,从而提高网络的运行时效率。在没有进行实现优化的桌面环境下,我们提出的SRHandNet对手部边界框检测可以达到40帧每秒,对手部关键点的精确估计可达30帧每秒。实验证明了该方法的有效性。在与所有最新方法的竞争中,该方法也取得了领先的结果。