IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7380-7399. doi: 10.1109/TPAMI.2021.3119563. Epub 2022 Oct 4.
Drones, or general UAVs, equipped with cameras have been fast deployed with a wide range of applications, including agriculture, aerial photography, and surveillance. Consequently, automatic understanding of visual data collected from drones becomes highly demanding, bringing computer vision and drones more and more closely. To promote and track the developments of object detection and tracking algorithms, we have organized three challenge workshops in conjunction with ECCV 2018, ICCV 2019 and ECCV 2020, attracting more than 100 teams around the world. We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i.e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking. In this paper, we first present a thorough review of object detection and tracking datasets and benchmarks, and discuss the challenges of collecting large-scale drone-based object detection and tracking datasets with fully manual annotations. After that, we describe our VisDrone dataset, which is captured over various urban/suburban areas of 14 different cities across China from North to South. Being the largest such dataset ever published, VisDrone enables extensive evaluation and investigation of visual analysis algorithms for the drone platform. We provide a detailed analysis of the current state of the field of large-scale object detection and tracking on drones, and conclude the challenge as well as propose future directions. We expect the benchmark largely boost the research and development in video analysis on drone platforms. All the datasets and experimental results can be downloaded from https://github.com/VisDrone/VisDrone-Dataset.
配备摄像头的无人机(通用无人机)已经得到了快速部署,并广泛应用于农业、航空摄影和监控等领域。因此,对从无人机收集的视觉数据进行自动理解的需求变得非常迫切,这使得计算机视觉和无人机越来越紧密地结合在一起。为了促进和跟踪目标检测和跟踪算法的发展,我们与 ECCV 2018、ICCV 2019 和 ECCV 2020 联合组织了三个挑战赛研讨会,吸引了来自世界各地的 100 多个团队参加。我们提供了一个大规模的无人机捕获数据集,即 VisDrone,其中包括四个跟踪任务:(1)图像目标检测,(2)视频目标检测,(3)单目标跟踪,(4)多目标跟踪。在本文中,我们首先对目标检测和跟踪数据集和基准进行了全面的回顾,并讨论了使用完全手动注释收集大规模基于无人机的目标检测和跟踪数据集的挑战。之后,我们描述了我们的 VisDrone 数据集,该数据集是在中国 14 个不同城市的各种城市/郊区地区拍摄的。作为迄今为止发布的最大规模的此类数据集,VisDrone 支持对无人机平台的视觉分析算法进行广泛的评估和研究。我们对大规模目标检测和跟踪在无人机上的现状进行了详细的分析,并总结了挑战赛,提出了未来的方向。我们希望该基准能大大推动无人机平台上的视频分析研究与开发。所有数据集和实验结果都可以从 https://github.com/VisDrone/VisDrone-Dataset 下载。