Foshan Xianhu Laboratory of the Advanced Energy Science and Technology Guangdong Laboratory, Xianhu Hydrogen Valley, Foshan 528200, China.
Hubei Key Laboratory of Advanced Technology for Automotive Components, Wuhan University of Technology, Wuhan 430070, China.
Sensors (Basel). 2022 Oct 20;22(20):8027. doi: 10.3390/s22208027.
An accurate object pose is essential to assess its state and predict its movements. In recent years, scholars have often predicted object poses by matching an image with a virtual 3D model or by regressing the six-degree-of-freedom pose of the target directly from the pixel data via deep learning methods. However, these approaches may ignore a fact that was proposed in the early days of computer vision research, i.e., that object parts are strongly represented in the object pose. In this study, we propose a novel and lightweight deep learning framework, YAEN (yaw angle estimation network), for accurate object yaw angle prediction from a monocular camera based on the arrangement of parts. YAEN uses an encoding−decoding structure for vehicle yaw angle prediction. The vehicle part arrangement information is extracted by the part-encoding network, and the yaw angle is extracted from vehicle part arrangement information by the yaw angle decoding network. Because vehicle part information is refined by the encoder, the decoding network structure is lightweight; the YAEN model has low hardware requirements and can reach a detection speed of 97FPS on a 2070s graphics cards. To improve the performance of our model, we used asymmetric convolution and SSE (sum of squared errors) loss functions of adding the sign. To verify the effectiveness of this model, we constructed an accurate yaw angle dataset under real-world conditions with two vehicles equipped with high-precision positioning devices. Experimental results prove that our method can achieve satisfactory prediction performance in scenarios in which vehicles do not obscure each other, with an average prediction error of less than 3.1° and an accuracy of 96.45% for prediction errors of less than 10° in real driving scenarios.
准确的目标姿态对于评估其状态和预测其运动至关重要。近年来,学者们经常通过将图像与虚拟 3D 模型进行匹配,或者通过深度学习方法直接从像素数据回归目标的六自由度姿态来预测目标姿态。然而,这些方法可能忽略了计算机视觉研究早期提出的一个事实,即目标的各个部分在目标姿态中得到了强烈的表示。在本研究中,我们提出了一种新颖的轻量级深度学习框架 YAEN(偏航角估计网络),用于基于部件排列从单目相机准确预测目标的偏航角。YAEN 使用编码-解码结构进行车辆偏航角预测。车辆部件排列信息由部件编码网络提取,偏航角由车辆部件排列信息通过偏航角解码网络提取。由于车辆部件信息由编码器细化,因此解码网络结构很轻量级;YAEN 模型对硬件的要求较低,在 2070s 显卡上的检测速度可以达到 97FPS。为了提高模型的性能,我们使用了不对称卷积和 SSE(均方误差)损失函数,其中包括符号。为了验证该模型的有效性,我们在配备高精度定位设备的两辆车辆的真实条件下构建了一个精确的偏航角数据集。实验结果证明,在车辆之间没有遮挡的情况下,我们的方法可以在实际驾驶场景中实现令人满意的预测性能,平均预测误差小于 3.1°,预测误差小于 10°的准确率为 96.45%。