IEEE Trans Pattern Anal Mach Intell. 2019 Jan;41(1):107-120. doi: 10.1109/TPAMI.2017.2784424. Epub 2017 Dec 18.
In this paper, we consider the problem of estimating the head pose and body orientation of a person from a low-resolution image. Under this setting, it is difficult to reliably extract facial features or detect body parts. We propose a convolutional random projection forest (CRPforest) algorithm for these tasks. A convolutional random projection network (CRPnet) is used at each node of the forest. It maps an input image to a high-dimensional feature space using a rich filter bank. The filter bank is designed to generate sparse responses so that they can be efficiently computed by compressive sensing. A sparse random projection matrix can capture most essential information contained in the filter bank without using all the filters in it. Therefore, the CRPnet is fast, e.g., it requires to process an image of pixels, due to the small number of convolutions (e.g., 0.01 percent of a layer of a neural network) at the expense of less than 2 percent accuracy. The overall forest estimates head and body pose well on benchmark datasets, e.g., over 98 percent on the HIIT dataset, while requiring without using a GPU. Extensive experiments on challenging datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in low-resolution images with noise, occlusion, and motion blur.
在本文中,我们考虑了从低分辨率图像估计人体头部姿势和身体方向的问题。在这种设置下,很难可靠地提取面部特征或检测身体部位。为此,我们提出了一种用于这些任务的卷积随机投影森林(CRPforest)算法。在森林的每个节点都使用卷积随机投影网络(CRPnet)。它使用丰富的滤波器组将输入图像映射到高维特征空间。滤波器组的设计目的是生成稀疏响应,以便可以通过压缩感知有效地计算它们。稀疏随机投影矩阵可以捕获滤波器组中包含的大部分基本信息,而无需使用其中的所有滤波器。因此,CRPnet 速度很快,例如,由于卷积的数量很少(例如,神经网络的一层的 0.01%),因此可以在不使用 GPU 的情况下处理像素图像。基准数据集上的整体森林对头部和身体姿势的估计效果很好,例如,在 HIIT 数据集上的准确率超过 98%,而无需使用 GPU。在具有噪声、遮挡和运动模糊的挑战性数据集上进行的广泛实验表明,与现有方法相比,该算法在低分辨率图像中具有优势。