IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2874-2881. doi: 10.1109/TPAMI.2020.3046323. Epub 2021 Jul 1.
We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.
我们提出了一种基于深度学习的多任务方法,用于图像中的头部姿势估计。我们贡献了一种网络架构和训练策略,利用面部姿势、对齐和可见性之间的强依赖关系,为所有三个任务生成一个性能卓越的模型。我们的架构是一个具有残差块和横向跳跃连接的编码器-解码器 CNN。我们表明,头部姿势估计和基于地标点的面部对齐的结合显著提高了前者的性能。此外,将姿势任务的位置放在瓶颈层(编码器的末端),以及将依赖于空间信息(如可见性和对齐)的任务放在最终解码器层,也有助于提高最终性能。在进行的实验中,所提出的模型在面部姿势和可见性任务中优于最先进的方法。通过包括最终的地标回归步骤,它还产生了与最先进方法相当的面部对齐结果。