通过双目深度补偿和多源时间双向长短期记忆网络在复杂驾驶环境中进行实时驾驶员注意力检测

Real-Time Driver Attention Detection in Complex Driving Environments via Binocular Depth Compensation and Multi-Source Temporal Bidirectional Long Short-Term Memory Network.

作者信息

Zhou Shuhui, Zhang Wei, Liu Yulong, Chen Xiaonian, Liu Huajie

机构信息

CGNPC Uranium Resources Co., Ltd., Beijing 100084, China.

Suzhou Automotive Research Institute (Wujiang), Tsinghua University, Suzhou 215200, China.

出版信息

Sensors (Basel). 2025 Sep 5;25(17):5548. doi: 10.3390/s25175548.

DOI:10.3390/s25175548

PMID:40942977

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12430898/

Abstract

Driver distraction is a key factor contributing to traffic accidents. However, in existing computer vision-based methods for driver attention state recognition, monocular camera-based approaches often suffer from low accuracy, while multi-sensor data fusion techniques are compromised by poor real-time performance. To address these limitations, this paper proposes a Real-time Driver Attention State Recognition method (RT-DASR). RT-DASR comprises two core components: Binocular Vision Depth-Compensated Head Pose Estimation (BV-DHPE) and Multi-source Temporal Bidirectional Long Short-Term Memory (MSTBi-LSTM). BV-DHPE employs binocular cameras and YOLO11n (You Only Look Once) Pose to locate facial landmarks, calculating spatial distances via binocular disparity to compensate for monocular depth deficiency for accurate pose estimation. MSTBi-LSTM utilizes a lightweight Bidirectional Long Short-Term Memory (Bi-LSTM) network to fuse head pose angles, real-time vehicle speed, and gaze region semantics, bidirectionally extracting temporal features for continuous attention state discrimination. Evaluated under challenging conditions (e.g., illumination changes, occlusion), BV-DHPE achieved 44.7% reduction in head pose Mean Absolute Error (MAE) compared to monocular vision methods. RT-DASR achieved 90.4% attention recognition accuracy with 21.5 ms average latency when deployed on NVIDIA Jetson Orin. Real-world driving scenario tests confirm that the proposed method provides a high-precision, low-latency attention state recognition solution for enhancing the safety of mining vehicle drivers. RT-DASR can be integrated into advanced driver assistance systems to enable proactive accident prevention.

摘要

驾驶员分心是导致交通事故的关键因素。然而，在现有的基于计算机视觉的驾驶员注意力状态识别方法中，基于单目摄像头的方法往往准确率较低，而多传感器数据融合技术则受实时性能不佳的影响。为解决这些局限性，本文提出了一种实时驾驶员注意力状态识别方法（RT-DASR）。RT-DASR包括两个核心组件：双目视觉深度补偿头部姿态估计（BV-DHPE）和多源时间双向长短期记忆（MSTBi-LSTM）。BV-DHPE使用双目摄像头和YOLO11n（你只看一次）姿态来定位面部特征点，通过双目视差计算空间距离，以补偿单目深度不足，实现精确的姿态估计。MSTBi-LSTM利用轻量级双向长短期记忆（Bi-LSTM）网络融合头部姿态角度、实时车速和注视区域语义，双向提取时间特征以进行连续的注意力状态判别。在具有挑战性的条件下（如光照变化、遮挡）进行评估时，与单目视觉方法相比，BV-DHPE的头部姿态平均绝对误差（MAE）降低了44.7%。当部署在NVIDIA Jetson Orin上时，RT-DASR的注意力识别准确率达到90.4%，平均延迟为21.5毫秒。实际驾驶场景测试证实，该方法为提高矿用车辆驾驶员的安全性提供了一种高精度、低延迟的注意力状态识别解决方案。RT-DASR可集成到先进的驾驶员辅助系统中，以实现主动预防事故。