Zhao Yongheng, Fang Guangchi, Guo Yulan, Guibas Leonidas, Tombari Federico, Birdal Tolga
Informatics at Technische Universität München, Munich, Germany.
School of Electronics and Communication Engineering, The Shenzhen Campus of Sun Yat-sen University, Shenzhen, China.
Int J Comput Vis. 2022;130(9):2321-2336. doi: 10.1007/s11263-022-01632-6. Epub 2022 Jul 30.
We present for learning robust, flexible and generalizable 3D object representations without requiring heavy annotation efforts or supervision. Unlike conventional 3D generative models, our algorithm aims for building a structured latent space where certain factors of shape variations, such as object parts, can be disentangled into independent sub-spaces. Our novel decoder then acts on these individual latent sub-spaces (i.e. capsules) using deconvolution operators to reconstruct 3D points in a self-supervised manner. We further introduce a cluster loss ensuring that the points reconstructed by a single capsule remain local and do not spread across the object uncontrollably. These contributions allow our network to tackle the challenging tasks of part segmentation, part interpolation/replacement as well as correspondence estimation across rigid / non-rigid shape, and across / within category. Our extensive evaluations on ShapeNet objects and human scans demonstrate that our network can learn generic representations that are robust and useful in many applications.
我们提出了一种无需大量标注工作或监督就能学习鲁棒、灵活且可泛化的3D对象表示的方法。与传统的3D生成模型不同,我们的算法旨在构建一个结构化的潜在空间,在这个空间中,某些形状变化因素,如物体部件,可以被解缠到独立的子空间中。然后,我们新颖的解码器使用反卷积算子作用于这些单独的潜在子空间(即胶囊),以自监督的方式重建3D点。我们还引入了一个聚类损失,以确保由单个胶囊重建的点保持局部性,不会不受控制地在物体上扩散。这些贡献使我们的网络能够应对部件分割、部件插值/替换以及跨刚性/非刚性形状、跨类别/类别内的对应估计等具有挑战性的任务。我们对ShapeNet对象和人体扫描的广泛评估表明,我们的网络可以学习到在许多应用中都鲁棒且有用的通用表示。