Bamdad Marziyeh, Hutter Hans-Peter, Darvishy Alireza
Institute of Computer Science, Zurich University of Applied Sciences, 8400 Winterthur, Switzerland.
Department of Informatics, University of Zurich, 8050 Zurich, Switzerland.
Sensors (Basel). 2024 Dec 21;24(24):8164. doi: 10.3390/s24248164.
Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 h of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 m and the 1% drift threshold, with classical methods showing drift up to 5-10%. While deep learning-based approaches maintained high pose estimation coverage (>90%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments.
同时定位与地图构建(SLAM)技术可用于帮助视障人士导航,但针对拥挤空间开发强大的SLAM解决方案受到缺乏真实数据集的限制。为了解决这个问题,我们引入了InCrowd-VI,这是一个专门为室内行人密集环境中的人类导航设计的新型视觉惯性数据集。该数据集使用Meta Aria Project眼镜录制,可在不受环境控制的情况下捕捉真实场景。InCrowd-VI包含58个序列,总轨迹长度为5公里,录制时间为1.5小时,包括RGB图像、立体图像和惯性测量单元(IMU)测量数据。该数据集捕捉了诸如行人遮挡、人群密度变化、复杂布局和光照变化等重要挑战。数据集中提供了精确到约2厘米的地面真值轨迹,这些轨迹来自Meta Aria项目的机器感知SLAM服务。此外,还为每个序列提供了场景的半密集三维点云。在InCrowd-VI上对先进的视觉里程计(VO)和SLAM算法进行的评估表明,在这些真实场景中存在严重的性能限制。在具有挑战性的条件下,系统超过了所需的0.5米定位精度和1%的漂移阈值,经典方法的漂移高达5%-10%。虽然基于深度学习的方法保持了较高的姿态估计覆盖率(>90%),但它们未能实现步行速度导航所需的实时处理速度。这些结果证明了一个新数据集对于推进复杂室内环境中视障人士导航的SLAM研究的必要性和价值。