Yang Xianghui, Lin Guosheng, Zhou Luping
IEEE Trans Image Process. 2023;32:3746-3758. doi: 10.1109/TIP.2023.3279661. Epub 2023 Jul 7.
Single-view 3D object reconstruction is a fundamental and challenging computer vision task that aims at recovering 3D shapes from single-view RGB images. Most existing deep learning based reconstruction methods are trained and evaluated on the same categories, and they cannot work well when handling objects from novel categories that are not seen during training. Focusing on this issue, this paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories and encourage models to reconstruct objects literally. Specifically, we propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction. Firstly, we factorize the complicated image-to-mesh mapping into two simpler mappings, i.e., image-to-point mapping and point-to-mesh mapping, while the latter is mainly a geometric problem and less dependent on object categories. Secondly, we devise a local feature sampling strategy in 2D and 3D feature spaces to capture the local geometry shared across objects to enhance model generalization. Thirdly, apart from the traditional point-to-point supervision, we introduce a multi-view silhouette loss to supervise the surface generation process, which provides additional regularization and further relieves the overfitting problem. The experimental results show that our method significantly outperforms the existing works on the ShapeNet and Pix3D under different scenarios and various metrics, especially for novel objects.
单视图3D物体重建是一项基础且具有挑战性的计算机视觉任务,旨在从单视图RGB图像中恢复3D形状。大多数现有的基于深度学习的重建方法都是在相同类别上进行训练和评估的,当处理训练期间未见过的新类别物体时,它们无法很好地工作。针对这个问题,本文着手研究单视图3D网格重建,以研究模型在未见类别上的泛化能力,并鼓励模型真实地重建物体。具体来说,我们提出了一种端到端的两阶段网络GenMesh,以打破重建中的类别界限。首先,我们将复杂的图像到网格映射分解为两个更简单的映射,即图像到点映射和点到网格映射,而后者主要是一个几何问题,对物体类别依赖性较小。其次,我们在2D和3D特征空间中设计了一种局部特征采样策略,以捕获跨物体共享的局部几何信息,从而增强模型的泛化能力。第三,除了传统的点对点监督外,我们还引入了多视图轮廓损失来监督表面生成过程,这提供了额外的正则化,并进一步缓解了过拟合问题。实验结果表明,在不同场景和各种指标下,我们的方法在ShapeNet和Pix3D上显著优于现有方法,特别是对于新物体。