IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6748-6765. doi: 10.1109/TPAMI.2021.3070543. Epub 2023 May 8.
We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of annotated multimodal sensor data including stereo cylindrical 360 RGB video at 15 fps, 3D point clouds from two 16 planar rays Velodyne LiDARs, line 3D point clouds from two Sick Lidars, audio signal, RGB-D video at 30 fps, 360 spherical image from a fisheye camera and encoder values from the robot's wheels. Our dataset incorporates data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, all from the ego-perspective of the robot, both stationary and navigating. The dataset has been annotated with over 2.4 million bounding boxes spread over five individual cameras and 1.8 million associated 3D cuboids around all people in the scenes totaling over 3500 time consistent trajectories. Together with our dataset and the annotations, we launch a benchmark and metrics for 2D and 3D person detection and tracking. With this dataset, which we plan on extending with further types of annotation in the future, we hope to provide a new source of data and a test-bench for research in the areas of egocentric robot vision, autonomous navigation, and all perceptual tasks around social robotics in human environments.
我们提出了 JRDB,这是一个从我们的社交移动机器人 JackRabbot 收集的新型自我中心数据集。该数据集包括 64 分钟的注释多模态传感器数据,包括立体圆柱 360°RGB 视频,帧率为 15fps,来自两个 16 平面射线 Velodyne LiDAR 的 3D 点云,来自两个 Sick Lidars 的线 3D 点云,音频信号,帧率为 30fps 的 RGB-D 视频,鱼眼相机的 360°球形图像以及机器人车轮的编码器值。我们的数据集结合了来自传统代表性不足的场景的数据,例如室内环境和行人区域,这些数据均来自机器人的自我视角,包括静止和导航。该数据集已使用超过 240 万个边界框进行注释,这些边界框分布在五个独立的摄像机上,并且在场景中的所有人周围都有 180 万个相关的 3D 长方体,总计超过 3500 个时间一致的轨迹。我们将数据集和注释一起发布,用于 2D 和 3D 人员检测和跟踪的基准和指标。通过这个数据集,我们计划在未来扩展更多类型的注释,我们希望为自我中心机器人视觉、自主导航以及人类环境中社交机器人的所有感知任务领域提供新的数据来源和测试平台。