Mohammed Asaad, Ibrahim Hosny M, Omar Nagwa M
Information Technology Department, Faculty of Computers and Information, Assiut University, Assiut, 71515, Egypt.
Sci Rep. 2025 Jun 20;15(1):20101. doi: 10.1038/s41598-025-02888-x.
Object detection is a fundamental task in computer vision. It has two primary types: one-stage detectors known for their high speed and efficiency, and two-stage detectors, which offer higher accuracy but are often slower due to their complex architecture. Balancing these two aspects has been a significant challenge in the field. RetinaNet, a premier single-stage object detector, is renowned for its remarkable balance between speed and accuracy. Its success is largely due to the groundbreaking focal loss function, which adeptly addresses the issue of class imbalance prevalent in object detection tasks. This innovative approach significantly enhances detection accuracy while maintaining high speed, making RetinaNet an ideal choice for a wide range of real-world applications. However, its performance decreases when applied to datasets containing objects with unique characteristics, such as objects with elongated or squat shapes. In such cases, the default anchor parameters may not fully meet the requirements of these specialized objects. To overcome this limitation, we present an enhancement to the RetinaNet model to improve its ability to handle variations in objects across different domains. Specifically, we propose an optimization algorithm based on Differential Evolution (DE) that adjusts anchor scales and ratios while determining the most appropriate number of these parameters for each dataset based on the annotated data. Through extensive experiments on datasets spanning diverse domains such as the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI), the Unconstrained Face Detection Dataset (UFDD), the TomatoPlantFactoryDataset, and the widely used Common Objects in Context (COCO) 2017 benchmark, we demonstrate that our proposed method significantly outperforms both the original RetinaNet and anchor-free methods by a considerable margin.
目标检测是计算机视觉中的一项基本任务。它主要有两种类型:一种是单阶段检测器,以其高速和高效著称;另一种是两阶段检测器,虽然精度更高,但由于其复杂的架构,速度往往较慢。在该领域中,平衡这两个方面一直是一项重大挑战。RetinaNet作为一种卓越的单阶段目标检测器,以其在速度和精度之间的出色平衡而闻名。它的成功很大程度上归功于开创性的焦点损失函数,该函数巧妙地解决了目标检测任务中普遍存在的类别不平衡问题。这种创新方法在保持高速的同时显著提高了检测精度,使RetinaNet成为广泛的实际应用的理想选择。然而,当将其应用于包含具有独特特征的物体的数据集时,例如具有细长或矮胖形状的物体,其性能会下降。在这种情况下,默认的锚点参数可能无法完全满足这些特殊物体的要求。为了克服这一限制,我们提出了对RetinaNet模型的一种改进,以提高其处理不同领域中物体变化的能力。具体而言,我们提出了一种基于差分进化(DE)的优化算法,该算法在根据标注数据为每个数据集确定最合适的锚点参数数量的同时,调整锚点的比例和比率。通过在跨越不同领域的数据集上进行广泛实验,如卡尔斯鲁厄理工学院和丰田技术研究所(KITTI)、无约束人脸检测数据集(UFDD)、番茄工厂数据集以及广泛使用的上下文常见物体(COCO)2017基准,我们证明了我们提出的方法在很大程度上显著优于原始的RetinaNet和无锚点方法。