Ramirez Pierluigi Zama, Costanzino Alex, Tosi Fabio, Poggi Matteo, Salti Samuele, Mattoccia Stefano, Stefano Luigi Di
IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):85-102. doi: 10.1109/TPAMI.2023.3323858. Epub 2023 Dec 5.
Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices that mount sensors with different resolutions. Additionally, we provide manually annotated material segmentation masks and 15 K unlabeled samples. The dataset is composed of a train set and two test sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks. Our experiments highlight the open challenges and future research directions in this field.
如今,从图像估计深度在域内精度和泛化方面都产生了出色的结果。然而,我们发现该领域仍存在两个主要挑战:处理非朗伯材质以及有效处理高分辨率图像。为此,我们提出了一个新颖的数据集,该数据集在高分辨率下包含准确且密集的地面真值标签,其场景包含多个镜面和透明表面。我们的采集管道利用了一种新颖的深度时空立体框架,能够以亚像素精度轻松且准确地进行标注。该数据集由在85个不同场景中收集的606个样本组成,每个样本都包括一个高分辨率对(1200万像素)以及一个不平衡立体对(左:1200万像素,右:110万像素),这是现代安装不同分辨率传感器的移动设备的典型配置。此外,我们还提供了手动标注的材质分割掩码和15000个未标注样本。该数据集由一个训练集和两个测试集组成,后者用于评估立体和单目深度估计网络。我们的实验突出了该领域中存在的开放性挑战和未来的研究方向。