Graduate School of Creative Science and Engineering, Waseda University, Tokyo 169-8555, Japan.
Research Institute for Science and Engineering (RISE), Waseda University, Tokyo 162-0044, Japan.
Sensors (Basel). 2022 Aug 5;22(15):5857. doi: 10.3390/s22155857.
Estimating the driver's gaze in a natural real-world setting can be problematic for different challenging scenario conditions. For example, faces will undergo facial occlusions, illumination, or various face positions while driving. In this effort, we aim to reduce misclassifications in driving situations when the driver has different face distances regarding the camera. Three-dimensional Convolutional Neural Networks (CNN) models can make a spatio-temporal driver's representation that extracts features encoded in multiple adjacent frames that can describe motions. This characteristic may help ease the deficiencies of a per-frame recognition system due to the lack of context information. For example, the front, navigator, right window, left window, back mirror, and speed meter are part of the known common areas to be checked by drivers. Based on this, we implement and evaluate a model that is able to detect the head direction toward these regions having various distances from the camera. In our evaluation, the 2D CNN model had a mean average recall of 74.96% across the three models, whereas the 3D CNN model had a mean average recall of 87.02%. This result show that our proposed 3D CNN-based approach outperforms a 2D CNN per-frame recognition approach in driving situations when the driver's face has different distances from the camera.
在自然真实环境中估计驾驶员的注视方向对于不同的挑战性场景条件可能是一个问题。例如,在驾驶过程中,人脸会经历面部遮挡、光照或各种面部位置。在这项工作中,我们旨在减少驾驶员与摄像机的面部距离不同时驾驶情况下的误分类。三维卷积神经网络(CNN)模型可以构建驾驶员的时空表示,提取多个相邻帧中编码的特征,这些特征可以描述运动。由于缺乏上下文信息,这一特征可能有助于缓解基于单帧识别系统的不足。例如,前窗、导航、右窗、左窗、后镜和速度计是驾驶员需要检查的已知常见区域的一部分。基于此,我们实现并评估了一种能够检测头部朝向这些区域的模型,这些区域与摄像机的距离各不相同。在我们的评估中,2D CNN 模型在三个模型中的平均召回率为 74.96%,而 3D CNN 模型的平均召回率为 87.02%。这一结果表明,与基于单帧识别的 2D CNN 方法相比,我们提出的基于 3D CNN 的方法在驾驶员与摄像机的面部距离不同的驾驶情况下表现更好。