Wu Shangzhe, Rupprecht Christian, Vedaldi Andrea
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):5268-5281. doi: 10.1109/TPAMI.2021.3076536. Epub 2023 Mar 7.
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least approximately, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.
我们提出了一种从原始单视图图像中学习3D可变形物体类别的方法,无需外部监督。该方法基于一个自动编码器,它将每个输入图像分解为深度、反照率、视点和光照。为了在无监督的情况下分离这些组件,我们利用了许多物体类别至少大致具有对称结构这一事实。我们表明,即使由于阴影导致外观不对称,对光照进行推理也能让我们利用潜在的物体对称性。此外,我们通过预测一个对称概率图来对可能但不一定对称的物体进行建模,该概率图与模型的其他组件一起端到端学习。我们的实验表明,这种方法可以从单视图图像中非常准确地恢复人脸、猫脸和汽车的3D形状,无需任何监督或先验形状模型。在基准测试中,与另一种在二维图像对应层面使用监督的方法相比,我们展示了更高的准确性。