Nazabal Alfredo, Tsagkas Nikolaos, Williams Christopher K I
Amazon Development Centre Scotland, Edinburgh EH1 3EG, U.K.
School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, U.K.
Neural Comput. 2023 Mar 18;35(4):727-761. doi: 10.1162/neco_a_01564.
Capsule networks (see Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this letter, we specify a generative model for such data and derive a variational algorithm for inferring the transformation of each model object in a scene and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan et al., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to data generated from multiple geometric objects like squares and triangles ("constellations") and data from a parts-based model of faces. Recent work by Kosiorek et al. (2019) has used amortized inference via stacked capsule autoencoders to tackle this problem; our results show that we significantly outperform them where we can make comparisons (on the constellations data).
胶囊网络(见辛顿等人,2018年)旨在对物体及其各部分之间的关系进行知识编码和推理。在这封信中,我们为这类数据指定了一个生成模型,并推导了一种变分算法,用于推断场景中每个模型物体的变换以及观察到的部分到物体的分配。我们基于变分期望最大化(乔丹等人,1999年)推导了物体模型的学习算法。我们还研究了一种基于菲施勒和博勒斯(1981年)的RANSAC方法的替代推理算法。我们将这些推理方法应用于从多个几何物体(如正方形和三角形,即“星座”)生成的数据以及基于部分的面部模型的数据。科肖雷克等人(2019年)最近的工作通过堆叠胶囊自动编码器使用摊销推理来解决这个问题;我们的结果表明,在可以进行比较的地方(在星座数据上),我们明显优于他们。