Zhao Bin, Han Pengfei, Li Xuelong
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2545-2554. doi: 10.1109/TPAMI.2023.3335953. Epub 2024 Mar 6.
Satellites are capable of capturing high-resolution videos. It makes vehicle perception from satellite become possible. Compared to street surveillance, drive recorder or other equipments, satellite videos provide a much broader city-scale view, so that the global dynamic scene of the traffic are captured and displayed. Traffic monitoring from satellite is a new task with great potential applications, including traffic jams prediction, path planning, vehicle dispatching, etc. Practically, limited by the resolution and view, the captured vehicles are very tiny (a few pixels) and move slowly. Worse still, these satellites are in Low Earth Orbit (LEO) to capture such high-resolution videos, so the background is also moving. Under this circumstance, traffic monitoring from the satellite view is an extremely challenging task. To attract more researchers into this field, we build a large-scale benchmark for traffic monitoring from satellite. It supports several tasks, including tiny object detection, counting and density estimation. The dataset is constructed based on 12 satellite videos and 14 synthetic videos recorded from GTA-V. They are separated into 408 video clips, which contain 7,336 real satellite images and 1,960 synthetic images. 128,801 vehicles are annotated totally, and the number of vehicles in each image varies from 0 to 101. Several classic and state-of-the-art approaches in traditional computer vision are evaluated on the datasets, so as to compare the performance of different approaches, analyze the challenges in this task, and discuss the future prospects.
卫星能够拍摄高分辨率视频。这使得从卫星进行车辆感知成为可能。与街道监控、行车记录仪或其他设备相比,卫星视频提供了更广阔的城市尺度视野,从而能够捕捉并展示全球交通动态场景。卫星交通监测是一项具有巨大潜在应用价值的新任务,包括交通拥堵预测、路径规划、车辆调度等。实际上,受分辨率和视野限制,所拍摄的车辆非常小(只有几个像素)且移动缓慢。更糟糕的是,这些卫星处于低地球轨道(LEO)以拍摄此类高分辨率视频,所以背景也在移动。在这种情况下,从卫星视角进行交通监测是一项极具挑战性的任务。为了吸引更多研究人员进入该领域,我们构建了一个用于卫星交通监测的大规模基准。它支持多项任务,包括微小目标检测、计数和密度估计。该数据集基于12个卫星视频和从侠盗猎车手V(GTA-V)录制的14个合成视频构建而成。它们被分成408个视频片段,包含7336张真实卫星图像和1960张合成图像。总共标注了128801辆车,每张图像中的车辆数量从0到101不等。在这些数据集上评估了传统计算机视觉中的几种经典和先进方法,以便比较不同方法的性能,分析该任务中的挑战,并探讨未来前景。