Ibrahim Bekkouch Imad Eddine, Eyharabide Victoria, Le Page Valérie, Billiet Frédéric
Sorbonne Center for Artificial Intelligence, Sorbonne University, 75005 Paris, France.
STIH Laboratory, Sorbonne University, 75005 Paris, France.
J Imaging. 2022 Jan 19;8(2):18. doi: 10.3390/jimaging8020018.
Detecting objects with a small representation in images is a challenging task, especially when the style of the images is very different from recent photos, which is the case for cultural heritage datasets. This problem is commonly known as few-shot object detection and is still a new field of research. This article presents a simple and effective method for black box few-shot object detection that works with all the current state-of-the-art object detection models. We also present a new dataset called MMSD for medieval musicological studies that contains five classes and 693 samples, manually annotated by a group of musicology experts. Due to the significant diversity of styles and considerable disparities between the artistic representations of the objects, our dataset is more challenging than the current standards. We evaluate our method on YOLOv4 (m/s), (Mask/Faster) RCNN, and ViT/Swin-t. We present two methods of benchmarking these models based on the overall data size and the worst-case scenario for object detection. The experimental results show that our method always improves object detector results compared to traditional transfer learning, regardless of the underlying architecture.
在图像中检测具有小表示的物体是一项具有挑战性的任务,特别是当图像的风格与近期照片非常不同时,文化遗产数据集就是这种情况。这个问题通常被称为少样本目标检测,并且仍然是一个新的研究领域。本文提出了一种简单有效的黑盒少样本目标检测方法,该方法适用于所有当前最先进的目标检测模型。我们还提出了一个名为MMSD的用于中世纪音乐学研究的新数据集,它包含五个类别和693个样本,由一组音乐学专家进行手动标注。由于风格的显著多样性以及物体艺术表现之间的巨大差异,我们的数据集比当前标准更具挑战性。我们在YOLOv4(m/s)、(Mask/Faster)RCNN和ViT/Swin-t上评估我们的方法。我们基于整体数据大小和目标检测的最坏情况场景提出了两种对这些模型进行基准测试的方法。实验结果表明,与传统迁移学习相比,我们的方法总能提高目标检测器的结果,无论底层架构如何。