Yogamani Senthil, Sistu Ganesh, Denny Patrick, Courtney Jane
School of Electrical & Electronic Engineering, Technological University Dublin, D07 ADY7 Dublin, Ireland.
D2ICE Research Centre, University of Limerick, V94 T9PX Limerick, Ireland.
Sensors (Basel). 2025 Jun 14;25(12):3735. doi: 10.3390/s25123735.
Object detection is a mature problem in autonomous driving, with pedestrian detection being one of the first commercially deployed algorithms. It has been extensively studied in the literature. However, object detection is relatively less explored for fisheye cameras used for surround-view near-field sensing. The standard bounding-box representation fails in fisheye cameras due to heavy radial distortion, particularly in the periphery. In this paper, a generic object detection framework is implemented using the base YOLO (You Only Look Once) detector to systematically explore various object representations using the public WoodScape dataset. First, we implement basic representations, namely the standard bounding box, the oriented bounding box, and the ellipse. Secondly, we implement a generic polygon and propose a novel curvature-adaptive polygon, which obtains an improvement of 3 mAP (mean average precision) points. A polygon is expensive to annotate and complex to use in downstream tasks; thus, it is not practical to use it in real-world applications. However, we utilize it to demonstrate that the accuracy gap between the polygon and the bounding box representation is very high due to strong distortion in fisheye cameras. This motivates the design of a distortion-aware optimal representation of the bounding box for fisheye images, which tend to be banana-shaped near the periphery. We derive a novel representation called a curved box and improve it further by leveraging vanishing-point constraints. The proposed curved box representations outperform the bounding box by 3 mAP points and the oriented bounding box by 1.6 mAP points. In addition, the camera geometry tensor is formulated to provide adaptation to non-linear fisheye camera distortion characteristics and improves the performance further by 1.4 mAP points.
目标检测在自动驾驶领域是一个成熟的问题,行人检测是最早商业化部署的算法之一。它在文献中已得到广泛研究。然而,对于用于环视近场传感的鱼眼相机,目标检测的研究相对较少。由于严重的径向畸变,特别是在图像边缘,标准的边界框表示在鱼眼相机中失效。在本文中,使用基础的YOLO(You Only Look Once)检测器实现了一个通用的目标检测框架,以利用公开的WoodScape数据集系统地探索各种目标表示。首先,我们实现了基本表示,即标准边界框、定向边界框和椭圆。其次,我们实现了一个通用多边形,并提出了一种新颖的曲率自适应多边形,其平均精度均值(mAP)提高了3个点。多边形标注成本高且在下游任务中使用复杂;因此,在实际应用中使用它并不实际。然而,我们利用它来证明由于鱼眼相机中的强烈畸变,多边形和边界框表示之间的精度差距非常大。这促使我们设计一种针对鱼眼图像的边界框的畸变感知最优表示,鱼眼图像在边缘附近往往呈香蕉形。我们推导出一种称为弯曲框的新颖表示,并通过利用消失点约束进一步改进它。所提出的弯曲框表示比边界框的mAP高3个点,比定向边界框高1.6个点。此外,还制定了相机几何张量,以适应非线性鱼眼相机的畸变特性,并使性能进一步提高1.4个mAP点。