用于实时自动驾驶的基于摄像头的鸟瞰图分割的比较研究与优化

Jun Woomin, Lee Sungjin

Korea Electronics Technology Institute, Seongnam 13488, Republic of Korea.

Department of Smart Automotive, Soonchunhyang University, Asan 31538, Republic of Korea.

Sensors (Basel). 2025 Apr 4;25(7):2300. doi: 10.3390/s25072300.

This study addresses the optimization of a camera-based bird's eye view (BEV) segmentation technique that operates in real-time within an embedded system environment while maintaining high accuracy despite limited computational resources. Specifically, it examines three technical approaches for BEV segmentation in autonomous driving: depth-based methods, MLP-based methods, and transformer-based methods, focusing on key techniques such as lift-splat-shoot, HDMapNet, and BEVFormer. A mathematical analysis of these methods is conducted, followed by a comparative performance evaluation using the nuScenes dataset. The optimization process was carried out in three stages: accuracy improvement, latency reduction, and model size optimization. In the first stage of the process, the three modules for BEV segmentation (encoder, view transformation, and decoder) were selected with the goal of maximizing mIoU performance. In the second stage, environmental variable optimization was performed through input resolution adaptation and data augmentation to improve accuracy. Finally, in the third stage, model compression was applied to minimize model size and latency for efficient deployment on embedded systems. Experimental results from the first stage show that the lift-splat-shoot view transformation model, based on the InternImage-B encoder and EfficientNet-B0 decoder, achieved the highest performance with 54.9 mIoU at an input image size of 448×800. Notably, the lift-splat-shoot view transformation model with the InternImage-T encoder and EfficientNet-B0 decoder demonstrated performance of 53.1 mIoU while achieving high efficiency (51.7 ms and 159.5 MB, respectively). The application of the second stage revealed that increasing the input resolution does not always lead to improved accuracy, and there is an optimal resolution size depending on the model. In this study, the best performance was achieved with an input image size of 448×800. During the third stage, FP16 quantization enabled a 50% reduction in memory size and decreased latency while maintaining similar or identical mIoU performance. When deployed on the NVIDIA AGX Orin device, which operates under power constraints, energy efficiency improved, although it resulted in higher latency under certain power supply conditions. As a result, the InternImage encoder-based lift-splat-shoot technique was shown to achieve the highest accuracy performance relative to latency and model size. This approach outperformed the original method by achieving a 29.2% higher mIoU while maintaining similar latency performance and reducing memory size by 32.2%.

本研究致力于优化一种基于摄像头的鸟瞰视图（BEV）分割技术，该技术在嵌入式系统环境中实时运行，尽管计算资源有限，但仍能保持高精度。具体而言，研究了自动驾驶中BEV分割的三种技术方法：基于深度的方法、基于多层感知器（MLP）的方法和基于变换器的方法，重点关注提升-拼接-投射（lift-splat-shoot）、高清地图网络（HDMapNet）和BEVFormer等关键技术。对这些方法进行了数学分析，随后使用nuScenes数据集进行了性能比较评估。优化过程分三个阶段进行：提高准确率、降低延迟和优化模型大小。在该过程的第一阶段，选择了用于BEV分割的三个模块（编码器、视图变换和解码器），目标是最大化平均交并比（mIoU）性能。在第二阶段，通过输入分辨率适配和数据增强进行环境变量优化，以提高准确率。最后，在第三阶段，应用模型压缩来最小化模型大小和延迟，以便在嵌入式系统上高效部署。第一阶段的实验结果表明，基于InternImage-B编码器和EfficientNet-B0解码器的提升-拼接-投射视图变换模型，在输入图像大小为448×800时，以54.9的mIoU实现了最高性能。值得注意的是，采用InternImage-T编码器和EfficientNet-B0解码器的提升-拼接-投射视图变换模型表现出53.1的mIoU性能，同时实现了高效性（分别为51.7毫秒和159.5兆字节）。第二阶段的应用表明，提高输入分辨率并不总是能提高准确率，并且存在一个取决于模型的最佳分辨率大小。在本研究中，输入图像大小为448×800时实现了最佳性能。在第三阶段，半浮点16位（FP-16）量化使内存大小减少了50%，并降低了延迟，同时保持了相似或相同的mIoU性能。当部署在受功率限制运行的英伟达AGX Orin设备上时，能源效率有所提高，尽管在某些电源条件下导致了更高的延迟。结果表明，基于InternImage编码器的提升-拼接-投射技术在延迟和模型大小方面实现了最高的准确率性能。该方法比原始方法表现更优，在保持相似延迟性能的同时，mIoU提高了29.2%，内存大小减少了32.2%。

相似文献

A Comparative Study and Optimization of Camera-Based BEV Segmentation for Real-Time Autonomous Driving.

Sensors (Basel). 2025 Apr 4;25(7):2300. doi: 10.3390/s25072300.

Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System.

Sensors (Basel). 2024 Jun 28;24(13):4205. doi: 10.3390/s24134205.

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8665-8679. doi: 10.1109/TPAMI.2024.3414835. Epub 2024 Nov 6.

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

Camera-view supervision for bird's-eye-view semantic segmentation.

Front Big Data. 2024 Nov 15;7:1431346. doi: 10.3389/fdata.2024.1431346. eCollection 2024.

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec 10;PP. doi: 10.1109/TPAMI.2024.3515454.

Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out.

Sensors (Basel). 2024 Jul 9;24(14):4446. doi: 10.3390/s24144446.

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2151-2170. doi: 10.1109/TPAMI.2023.3333838. Epub 2024 Mar 6.

DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.

Sensors (Basel). 2024 Oct 31;24(21):7007. doi: 10.3390/s24217007.

IRBEVF-Q: Optimization of Image-Radar Fusion Algorithm Based on Bird's Eye View Features.

Sensors (Basel). 2024 Jul 16;24(14):4602. doi: 10.3390/s24144602.

本文引用的文献

Vision-Centric BEV Perception: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10978-10997. doi: 10.1109/TPAMI.2024.3449912. Epub 2024 Nov 6.

A Method of Deep Learning Model Optimization for Image Classification on Edge Device.

Sensors (Basel). 2022 Sep 27;22(19):7344. doi: 10.3390/s22197344.

A Survey of End-to-End Driving: Architectures and Training Methods.

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1364-1384. doi: 10.1109/TNNLS.2020.3043505. Epub 2022 Apr 4.

Event-Based Vision: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):154-180. doi: 10.1109/TPAMI.2020.3008413. Epub 2021 Dec 7.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

A Comparative Study and Optimization of Camera-Based BEV Segmentation for Real-Time Autonomous Driving.

Sensors (Basel). 2025 Apr 4;25(7):2300. doi: 10.3390/s25072300.

Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System.

Sensors (Basel). 2024 Jun 28;24(13):4205. doi: 10.3390/s24134205.

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8665-8679. doi: 10.1109/TPAMI.2024.3414835. Epub 2024 Nov 6.

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

Camera-view supervision for bird's-eye-view semantic segmentation.

Front Big Data. 2024 Nov 15;7:1431346. doi: 10.3389/fdata.2024.1431346. eCollection 2024.

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec 10;PP. doi: 10.1109/TPAMI.2024.3515454.

Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out.

Sensors (Basel). 2024 Jul 9;24(14):4446. doi: 10.3390/s24144446.

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2151-2170. doi: 10.1109/TPAMI.2023.3333838. Epub 2024 Mar 6.

DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.

Sensors (Basel). 2024 Oct 31;24(21):7007. doi: 10.3390/s24217007.

IRBEVF-Q: Optimization of Image-Radar Fusion Algorithm Based on Bird's Eye View Features.

Sensors (Basel). 2024 Jul 16;24(14):4602. doi: 10.3390/s24144602.

本文引用的文献

Vision-Centric BEV Perception: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10978-10997. doi: 10.1109/TPAMI.2024.3449912. Epub 2024 Nov 6.

A Method of Deep Learning Model Optimization for Image Classification on Edge Device.

Sensors (Basel). 2022 Sep 27;22(19):7344. doi: 10.3390/s22197344.

A Survey of End-to-End Driving: Architectures and Training Methods.

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1364-1384. doi: 10.1109/TNNLS.2020.3043505. Epub 2022 Apr 4.

Event-Based Vision: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):154-180. doi: 10.1109/TPAMI.2020.3008413. Epub 2021 Dec 7.

A Comparative Study and Optimization of Camera-Based BEV Segmentation for Real-Time Autonomous Driving.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献