基于新型分类头的两阶段行人检测模型用于领域泛化

Two-Stage Pedestrian Detection Model Using a New Classification Head for Domain Generalization.

作者信息

Schulz Daniel, Perez Claudio A

机构信息

Department of Electrical Engineering, and Advanced Mining Technology Center, Universidad de Chile, Santiago 8370451, Chile.

IMPACT, Center of Interventional Medicine for Precision and Advanced Cellular Therapy, Santiago 7620086, Chile.

出版信息

Sensors (Basel). 2023 Nov 24;23(23):9380. doi: 10.3390/s23239380.

DOI:10.3390/s23239380

PMID:38067753

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10708870/

Abstract

Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom classification head, adding the triplet loss function to the standard bounding box regression and classification losses, is presented. This aims to improve the domain generalization capabilities of existing pedestrian detectors, by explicitly maximizing inter-class distance and minimizing intra-class distance. Triplet loss is applied to the features generated by the region proposal network, aimed at clustering together pedestrian samples in the features space. We used Faster R-CNN and Cascade R-CNN with the HRNet backbone pre-trained on ImageNet, changing the standard classification head for Faster R-CNN, and changing one of the three heads for Cascade R-CNN. The best results were obtained using a progressive training pipeline, starting from a dataset that is further away from the target domain, and progressively fine-tuning on datasets closer to the target domain. We obtained state-of-the-art results, MR-2 of 9.9, 11.0, and 36.2 for the reasonable, small, and heavy subsets on the CityPersons benchmark with outstanding performance on the heavy subset, the most difficult one.

摘要

在过去几年中，基于深度学习方法的行人检测取得了巨大成功，有多种可能的实际应用，包括自动驾驶、机器人导航和视频监控。在这项工作中，提出了一种新的神经网络两阶段行人检测器，它具有一个新的自定义分类头，在标准边界框回归和分类损失的基础上添加了三元组损失函数。这样做的目的是通过明确最大化类间距离和最小化类内距离来提高现有行人检测器的领域泛化能力。三元组损失应用于区域提议网络生成的特征，旨在将特征空间中的行人样本聚类在一起。我们使用了在ImageNet上预训练的具有HRNet主干的Faster R-CNN和Cascade R-CNN，改变了Faster R-CNN的标准分类头，并改变了Cascade R-CNN三个头中的一个。通过使用渐进式训练管道获得了最佳结果，从远离目标域的数据集开始，然后在更接近目标域的数据集上逐步微调。我们取得了当前最优的结果，在CityPersons基准测试的合理、小和重子集上，MR-2分别为9.9、11.0和36.2，在最难的重子集上表现出色。