School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China.
Sensors (Basel). 2022 Apr 20;22(9):3154. doi: 10.3390/s22093154.
Head pose and eye gaze are vital clues for analysing a driver's visual attention. Previous approaches achieve promising results from point clouds in constrained conditions. However, these approaches face challenges in the complex naturalistic driving scene. One of the challenges is that the collected point cloud data under non-uniform illumination and large head rotation is prone to partial facial occlusion. It causes bad transformation during failed template matching or incorrect feature extraction. In this paper, a novel estimation method is proposed for predicting accurate driver head pose and gaze zone using an RGB-D camera, with an effective point cloud fusion and registration strategy. In the fusion step, to reduce bad transformation, continuous multi-frame point clouds are registered and fused to generate a stable point cloud. In the registration step, to reduce reliance on template registration, multiple point clouds in the nearest neighbor gaze zone are utilized as a template point cloud. A coarse transformation computed by the normal distributions transform is used as the initial transformation, and updated with particle filter. A gaze zone estimator is trained by combining the head pose and eye image features, in which the head pose is predicted by point cloud registration, and the eye image features are extracted via multi-scale spare coding. Extensive experiments demonstrate that the proposed strategy achieves better results on head pose tracking, and also has a low error on gaze zone classification.
头部姿势和眼睛注视是分析驾驶员视觉注意力的重要线索。以前的方法从受约束条件下的点云中取得了有希望的结果。然而,这些方法在复杂的自然驾驶场景中面临挑战。其中一个挑战是,在非均匀光照和大头部旋转下收集的点云数据容易发生部分面部遮挡。这会导致模板匹配失败或特征提取不正确时产生不良变换。在本文中,提出了一种新的估计方法,使用 RGB-D 相机预测准确的驾驶员头部姿势和注视区域,采用有效的点云融合和配准策略。在融合步骤中,为了减少不良变换,连续多帧点云被注册和融合以生成稳定的点云。在配准步骤中,为了减少对模板配准的依赖,将最近邻注视区域中的多个点云用作模板点云。正态分布变换计算的粗变换用作初始变换,并通过粒子滤波器进行更新。通过结合头部姿势和眼睛图像特征来训练注视区域估计器,其中头部姿势通过点云配准进行预测,眼睛图像特征通过多尺度稀疏编码提取。大量实验表明,所提出的策略在头部姿势跟踪方面取得了更好的结果,并且在注视区域分类方面也具有较低的误差。