Zhang Weidong, Zhang Youmei, Song Ran, Liu Ying, Zhang Wei
IEEE Trans Image Process. 2022;31:868-879. doi: 10.1109/TIP.2021.3131025. Epub 2022 Jan 4.
The task of 3D layout estimation in an indoor scene is to predict the holistic 3D structural information of the scene from an RGB image. It is costly to obtain the ground truth 3D layout, and this issue severely restricts the learning based 3D layout estimation approaches. In this paper, we present a novel weakly supervised learning framework that is able to learn the 3D layout effectively with 2D layout segmentation mask as supervision. We employ a deep neural network to predict the plane parameters and camera intrinsic parameters in the image. Based on the predicted plane instances, the 3D layout as well as the corresponding depth map and 2D segmentation can be generated. The key objectives for learning meaningful plane parameters are the label consistency of layout segmentation and depth consistency of border pixels from adjacent planes, with which the ground truth 2D layout segmentation is able to supervise the learning of the 3D layout. We further incorporate 3D geometric reasoning and prior knowledge in the learning process to ensure that the learned 3D layout is realistic and reasonable. Experimental results show that our method can produce accurate 3D layout estimates by weakly supervised learning.
室内场景中的三维布局估计任务是从RGB图像中预测场景的整体三维结构信息。获取真实的三维布局成本高昂,这一问题严重限制了基于学习的三维布局估计方法。在本文中,我们提出了一种新颖的弱监督学习框架,该框架能够以二维布局分割掩码作为监督有效地学习三维布局。我们使用深度神经网络来预测图像中的平面参数和相机内参。基于预测的平面实例,可以生成三维布局以及相应的深度图和二维分割。学习有意义的平面参数的关键目标是布局分割的标签一致性和相邻平面边界像素的深度一致性,利用这一点,真实的二维布局分割能够监督三维布局的学习。我们在学习过程中进一步融入三维几何推理和先验知识,以确保学习到的三维布局是真实合理的。实验结果表明,我们的方法可以通过弱监督学习产生准确的三维布局估计。