基于改进胶囊网络的注意力引导Huber损失用于头部姿态估计

Attention-Guided Huber Loss for Head Pose Estimation Based on Improved Capsule Network.

作者信息

Zhong Runhao, He Li, Wang Hongwei, Yuan Liang, Li Kexin, Liu Zhening

机构信息

School of Mechanical Engineering, Xinjiang University, Urumqi 830046, China.

School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.

出版信息

Entropy (Basel). 2023 Jul 5;25(7):1024. doi: 10.3390/e25071024.

DOI:10.3390/e25071024

PMID:37509971

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10378512/

Abstract

Head pose estimation is an important technology for analyzing human behavior and has been widely researched and applied in areas such as human-computer interaction and fatigue detection. However, traditional head pose estimation networks suffer from the problem of easily losing spatial structure information, particularly in complex scenarios where occlusions and multiple object detections are common, resulting in low accuracy. To address the above issues, we propose a head pose estimation model based on the residual network and capsule network. Firstly, a deep residual network is used to extract features from three stages, capturing spatial structure information at different levels, and a global attention block is employed to enhance the spatial weight of feature extraction. To effectively avoid the loss of spatial structure information, the features are encoded and transmitted to the output using an improved capsule network, which is enhanced in its generalization ability through self-attention routing mechanisms. To enhance the robustness of the model, we optimize Huber loss, which is first used in head pose estimation. Finally, experiments are conducted on three popular public datasets, 300W-LP, AFLW2000, and BIWI. The results demonstrate that the proposed method achieves state-of-the-art results, particularly in scenarios with occlusions.

摘要

头部姿态估计是分析人类行为的一项重要技术，已在人机交互和疲劳检测等领域得到广泛研究和应用。然而，传统的头部姿态估计网络存在容易丢失空间结构信息的问题，特别是在遮挡和多目标检测常见的复杂场景中，导致准确率较低。为了解决上述问题，我们提出了一种基于残差网络和胶囊网络的头部姿态估计模型。首先，使用深度残差网络从三个阶段提取特征，在不同层次上捕捉空间结构信息，并采用全局注意力模块来增强特征提取的空间权重。为了有效避免空间结构信息的丢失，通过改进的胶囊网络对特征进行编码并传输到输出，该胶囊网络通过自注意力路由机制增强了其泛化能力。为了提高模型的鲁棒性，我们优化了Huber损失，这是首次在头部姿态估计中使用。最后，在三个流行的公共数据集300W-LP、AFLW2000和BIWI上进行了实验。结果表明，所提出的方法取得了最优结果，特别是在存在遮挡的场景中。