Braun Markus, Krebs Sebastian, Flohr Fabian, Gavrila Dariu
IEEE Trans Pattern Anal Mach Intell. 2019 Feb 5. doi: 10.1109/TPAMI.2019.2897684.
Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238200 person instances manually labeled in over 47300 images, EuroCity Persons is nearly one order of magnitude larger than datasets used previously for person detection in traffic scenes. The dataset furthermore contains a large number of person orientation annotations (over 211200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. We analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day- vs. night-time, geographical region), the dataset detail (i.e. availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead.
大数据在计算机视觉领域深度学习的成功中发挥了重要作用。近期的研究表明,通过使用更大的数据集,在提高目标检测性能方面仍有巨大潜力。在本文中,我们介绍了欧洲城市行人数据集,该数据集提供了大量关于城市交通场景中行人、骑自行车的人和其他骑行者的高度多样化、准确且详细的标注。该数据集的图像是在12个欧洲国家的31个城市的移动车辆上采集的。欧洲城市行人数据集在超过47300张图像中手动标注了超过238200个人实例,比之前用于交通场景中行人检测的数据集大近一个数量级。此外,该数据集还包含大量的人物方向标注(超过211200个)。我们优化了四种先进的深度学习方法(Faster R-CNN、R-FCN、SSD和YOLOv3),将其作为新目标检测基准的基线。我们分析了使用新数据集训练时这些检测器的泛化能力。此外,我们研究了训练集大小、数据集多样性(白天与夜间、地理区域)、数据集细节(即目标方向信息的可用性)和标注质量对检测器性能的影响。最后,我们分析了误差来源并探讨了未来的发展方向。