Zhang Shenyong, Wang Wenmin, Wang Zhibing, Li Honglei, Li Ruochen, Zhang Shixiong
School of Computer Science and Engineering, Macau University of Science and Technology, Macau 999078, China.
School of Computer Technology, Beijing Institute of Technology, Zhuhai 519088, China.
Sensors (Basel). 2024 Dec 7;24(23):7833. doi: 10.3390/s24237833.
Traditional object detectors require extensive instance-level annotations for training. Conversely, few-shot object detectors, which are generally fine-tuned using limited data from unknown classes, tend to show biases toward base categories and are susceptible to variations within these unknown samples. To mitigate these challenges, we introduce a Two-Stage Fine-Tuning Approach (TFA) named Extreme R-CNN, designed to operate effectively with extremely limited original samples through the integration of sample synthesis and knowledge distillation. Our approach involves synthesizing new training examples via instance clipping and employing various data-augmentation techniques. We enhance the Faster R-CNN architecture by decoupling the regression and classification components of the Region of Interest (RoI), allowing synthetic samples to train the classification head independently of the object-localization process. Comprehensive evaluations on the Microsoft COCO and PASCAL VOC datasets demonstrate significant improvements over baseline methods. Specifically, on the PASCAL VOC dataset, the average precision for novel categories is enhanced by up to 15 percent, while on the more complex Microsoft COCO benchmark it is enhanced by up to 6.1 percent. Remarkably, in the 1-shot scenario, the AP50 of our model exceeds that of the baseline model in the 10-shot setting within the PASCAL VOC dataset, confirming the efficacy of our proposed method.
传统的目标检测器需要大量的实例级注释来进行训练。相反,少样本目标检测器通常使用来自未知类别的有限数据进行微调,往往会对基础类别表现出偏差,并且容易受到这些未知样本内变化的影响。为了缓解这些挑战,我们引入了一种名为极端区域卷积神经网络(Extreme R-CNN)的两阶段微调方法(TFA),旨在通过整合样本合成和知识蒸馏,在极其有限的原始样本上有效运行。我们的方法包括通过实例裁剪合成新的训练示例,并采用各种数据增强技术。我们通过解耦感兴趣区域(RoI)的回归和分类组件来增强更快区域卷积神经网络(Faster R-CNN)架构,使合成样本能够独立于目标定位过程训练分类头。在微软COCO和PASCAL VOC数据集上的综合评估表明,相对于基线方法有显著改进。具体而言,在PASCAL VOC数据集上,新类别平均精度提高了高达15%,而在更复杂的微软COCO基准测试中提高了高达6.1%。值得注意的是,在单样本场景中,我们模型在PASCAL VOC数据集中的AP50超过了基线模型在十样本设置下的AP50,证实了我们所提出方法的有效性。