Subhedar Javed, Bachute Mrinal R
Department of Electronics and Telecommunication Symbiosis Institute of Technology, Pune, India.
MethodsX. 2025 May 23;14:103387. doi: 10.1016/j.mex.2025.103387. eCollection 2025 Jun.
One of the critical tasks of autonomous driving systems is the Perception task (detecting the surroundings), which involves semantic Segmentation. The vital computer vision task of semantic segmentation assigns a "label" to every pixel in the input image. "Semantic segmentation" task consists of partitioning scenes as seen by the Autonomous Vehicle into several communicative slices by categorizing and labelling all image pixel for semantics. This paper gives insights into DeepNet V3 + architecture with ResNet50V2 as the backbone and the other as EfficientNetv2 backbone for feature extraction. The impact of the Squeeze and Excitation module and the Convolutional Block Attention Module is also compared for these architectures for semantic segmentation using the CAMVid data set. All six models are evaluated for Categorical Accuracy and mIoU metrics. The maximum Categorical Accuracy of 97.25 % was achieved in the model ResNet50V2 as the backbone and the Mean IoU of 80.56 %•Feature extraction using DeepNet V3 + architecture with ResNet50V2 and EfficientNetv2 as the backbone.•Insights of using the Squeeze and Excitation and Convolutional Block Attention Module for the DeepNet V3 + architecture.
自动驾驶系统的关键任务之一是感知任务(检测周围环境),这涉及语义分割。语义分割这项重要的计算机视觉任务会为输入图像中的每个像素分配一个“标签”。“语义分割”任务包括通过对所有图像像素进行语义分类和标记,将自动驾驶车辆所看到的场景划分为几个可交流的部分。本文深入探讨了以ResNet50V2为骨干以及以EfficientNetv2为骨干进行特征提取的DeepNet V3 +架构。还针对这些用于语义分割的架构,比较了挤压与激励模块和卷积块注意力模块的影响,使用的是CAMVid数据集。所有六个模型都针对分类准确率和平均交并比指标进行了评估。以ResNet50V2为骨干的模型实现了97.25%的最高分类准确率,平均交并比为80.56%。
• 使用以ResNet50V2和EfficientNetv2为骨干的DeepNet V3 +架构进行特征提取。
• 对DeepNet V3 +架构使用挤压与激励模块和卷积块注意力模块的见解。