使用YOLO进行目标检测：挑战、架构继任者、数据集及应用

Object detection using YOLO: challenges, architectural successors, datasets and applications.

作者信息

Diwan Tausif, Anirudh G, Tembhurne Jitendra V

机构信息

Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India.

Department of Data science and analytics, Central University of Rajasthan, Jaipur, Rajasthan India.

出版信息

Multimed Tools Appl. 2023;82(6):9243-9275. doi: 10.1007/s11042-022-13644-y. Epub 2022 Aug 8.

DOI:10.1007/s11042-022-13644-y

PMID:35968414

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9358372/

Abstract

Object detection is one of the predominant and challenging problems in computer vision. Over the decade, with the expeditious evolution of deep learning, researchers have extensively experimented and contributed in the performance enhancement of object detection and related tasks such as object classification, localization, and segmentation using underlying deep models. Broadly, object detectors are classified into two categories viz. two stage and single stage object detectors. Two stage detectors mainly focus on selective region proposals strategy via complex architecture; however, single stage detectors focus on all the spatial region proposals for the possible detection of objects via relatively simpler architecture in one shot. Performance of any object detector is evaluated through detection accuracy and inference time. Generally, the detection accuracy of two stage detectors outperforms single stage object detectors. However, the inference time of single stage detectors is better compared to its counterparts. Moreover, with the advent of YOLO (You Only Look Once) and its architectural successors, the detection accuracy is improving significantly and sometime it is better than two stage detectors. YOLOs are adopted in various applications majorly due to their faster inferences rather than considering detection accuracy. As an example, detection accuracies are 63.4 and 70 for YOLO and Fast-RCNN respectively, however, inference time is around 300 times faster in case of YOLO. In this paper, we present a comprehensive review of single stage object detectors specially YOLOs, regression formulation, their architecture advancements, and performance statistics. Moreover, we summarize the comparative illustration between two stage and single stage object detectors, among different versions of YOLOs, applications based on two stage detectors, and different versions of YOLOs along with the future research directions.

摘要

目标检测是计算机视觉中主要且具有挑战性的问题之一。在过去十年中，随着深度学习的迅速发展，研究人员广泛进行了实验，并利用底层深度模型在提高目标检测及相关任务（如目标分类、定位和分割）的性能方面做出了贡献。广义上讲，目标检测器分为两类，即两阶段和单阶段目标检测器。两阶段检测器主要通过复杂架构专注于选择性区域提议策略；然而，单阶段检测器通过相对简单的架构一次性专注于所有空间区域提议以进行目标的可能检测。任何目标检测器的性能都通过检测精度和推理时间来评估。通常，两阶段检测器的检测精度优于单阶段目标检测器。然而，单阶段检测器的推理时间比其同类更好。此外，随着YOLO（You Only Look Once）及其架构后继者的出现，检测精度有了显著提高，有时甚至优于两阶段检测器。YOLO主要因其推理速度更快而被应用于各种场景，而非考虑检测精度。例如，YOLO和Fast - RCNN的检测精度分别为63.4和70，但YOLO的推理时间快约300倍。在本文中，我们对单阶段目标检测器，特别是YOLO、回归公式、它们的架构进展和性能统计进行了全面综述。此外，我们总结了两阶段和单阶段目标检测器之间的对比说明、不同版本的YOLO之间的对比、基于两阶段检测器的应用以及不同版本的YOLO以及未来的研究方向。