IEEE Trans Pattern Anal Mach Intell. 2014 Apr;36(4):797-809. doi: 10.1109/TPAMI.2013.163.
Pedestrian detection is of paramount interest for many applications. Most promising detectors rely on discriminatively learnt classifiers, i.e., trained with annotated samples. However, the annotation step is a human intensive and subjective task worth to be minimized. By using virtual worlds we can automatically obtain precise and rich annotations. Thus, we face the question: can a pedestrian appearance model learnt in realistic virtual worlds work successfully for pedestrian detection in real-world images? Conducted experiments show that virtual-world based training can provide excellent testing accuracy in real world, but it can also suffer the data set shift problem as real-world based training does. Accordingly, we have designed a domain adaptation framework, V-AYLA, in which we have tested different techniques to collect a few pedestrian samples from the target domain (real world) and combine them with the many examples of the source domain (virtual world) in order to train a domain adapted pedestrian classifier that will operate in the target domain. V-AYLA reports the same detection accuracy than when training with many human-provided pedestrian annotations and testing with real-world images of the same domain. To the best of our knowledge, this is the first work demonstrating adaptation of virtual and real worlds for developing an object detector.
行人检测在许多应用中至关重要。最有前途的探测器依赖于区分学习的分类器,即使用带注释的样本进行训练。然而,注释步骤是一项需要大量人力和主观性的任务,值得最小化。通过使用虚拟世界,我们可以自动获得精确和丰富的注释。因此,我们面临一个问题:在现实的虚拟世界中学习的行人外观模型是否可以成功地用于真实世界图像中的行人检测?进行的实验表明,基于虚拟世界的训练可以在真实世界中提供出色的测试精度,但它也可能像基于真实世界的训练一样受到数据集中转移问题的影响。因此,我们设计了一个域自适应框架 V-AYLA,其中我们测试了不同的技术,从目标域(真实世界)中收集一些行人样本,并将它们与源域(虚拟世界)的许多示例结合起来,以训练在目标域中运行的域自适应行人分类器。V-AYLA 报告的检测准确率与使用许多人工提供的行人注释进行训练和在同一领域的真实世界图像进行测试时相同。据我们所知,这是第一个演示虚拟和真实世界适应以开发目标检测器的工作。