Suppr超能文献

广义焦点损失:面向密集目标检测的有效表示学习。

Generalized Focal Loss: Towards Efficient Representation Learning for Dense Object Detection.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3139-3153. doi: 10.1109/TPAMI.2022.3180392. Epub 2023 Feb 3.

Abstract

Object detection is a fundamental computer vision task that simultaneously predicts the category and localization of the targets of interest. Recently one-stage (also termed "dense") detectors have gained much attention over two-stage ones due to their simple pipeline and friendly application to end devices. Dense object detectors basically formulate object detection as dense classification and localization (i.e., bounding box regression). The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for dense detectors is to introduce an individual prediction branch to estimate the quality of localization, which facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Three problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference, (2) the inflexible Dirac delta distribution for localization, and (3) the deficient and implicit guidance for accurate quality estimation. To address these problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation, use a vector to represent arbitrary distribution of box locations, and extract discriminant feature descriptors from the distribution vector for more reliable quality estimation. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFocal) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. Extensive experiments demonstrate the effectiveness of our method, without sacrificing the efficiency both in training and inference. Based on GFocal, we construct a considerably fast and lightweight detector termed NanoDet under mobile settings, which is 1.8 AP higher, 2x faster and 6x smaller than scaled YoloV4-Tiny.

摘要

目标检测是计算机视觉中的一项基本任务,它可以同时预测感兴趣目标的类别和位置。最近,由于其简单的流水线和对终端设备的友好应用,一阶段(也称为“密集型”)检测器受到了广泛关注,而两阶段检测器则相形见绌。密集目标检测器基本上将目标检测表述为密集分类和定位(即边界框回归)。分类通常通过焦点损失进行优化,而框位置通常根据狄拉克δ分布进行学习。密集型检测器的一个最新趋势是引入单独的预测分支来估计定位质量,这有助于分类提高检测性能。本文深入研究了上述三个基本要素的表示方法:质量估计、分类和定位。在现有实践中发现了三个问题,包括(1)在训练和推理中,质量估计和分类的使用不一致,(2)定位的狄拉克δ分布缺乏灵活性,(3)准确质量估计的指导不足且隐含。为了解决这些问题,我们为这些元素设计了新的表示方法。具体来说,我们将质量估计合并到类预测向量中,形成一个联合表示,使用向量来表示框位置的任意分布,并从分布向量中提取鉴别特征描述符,以进行更可靠的质量估计。改进的表示方法消除了不一致的风险,准确地描述了真实数据中的灵活分布,但包含连续标签,这超出了焦点损失的范围。然后,我们提出了广义焦点损失(GFocal),它将焦点损失从离散形式推广到连续形式,以便成功优化。广泛的实验证明了我们方法的有效性,同时在训练和推理效率方面都没有牺牲。基于 GFocal,我们在移动设置下构建了一个相当快速和轻量级的检测器 NanoDet,它比缩放后的 YoloV4-Tiny 高 1.8 AP,快 2 倍,小 6 倍。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验