Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404
Neural Comput. 2021 Jan;33(1):194-226. doi: 10.1162/neco_a_01336. Epub 2020 Oct 20.
We investigate a latent variable model for multinomial classification inspired by recent capsule architectures for visual object recognition (Sabour, Frosst, & Hinton, 2017). Capsule architectures use vectors of hidden unit activities to encode the pose of visual objects in an image, and they use the lengths of these vectors to encode the probabilities that objects are present. Probabilities from different capsules can also be propagated through deep multilayer networks to model the part-whole relationships of more complex objects. Notwithstanding the promise of these networks, there still remains much to understand about capsules as primitive computing elements in their own right. In this letter, we study the problem of capsule regression-a higher-dimensional analog of logistic, probit, and softmax regression in which class probabilities are derived from vectors of competing magnitude. To start, we propose a simple capsule architecture for multinomial classification: the architecture has one capsule per class, and each capsule uses a weight matrix to compute the vector of hidden unit activities for patterns it seeks to recognize. Next, we show how to model these hidden unit activities as latent variables, and we use a squashing nonlinearity to convert their magnitudes as vectors into normalized probabilities for multinomial classification. When different capsules compete to recognize the same pattern, the squashing nonlinearity induces nongaussian terms in the posterior distribution over their latent variables. Nevertheless, we show that exact inference remains tractable and use an expectation-maximization procedure to derive least-squares updates for each capsule's weight matrix. We also present experimental results to demonstrate how these ideas work in practice.
我们研究了一种基于最近视觉对象识别胶囊架构的多项分类潜在变量模型(Sabour、Frosst 和 Hinton,2017)。胶囊架构使用隐藏单元活动向量来编码图像中视觉对象的姿势,并使用这些向量的长度来编码对象存在的概率。来自不同胶囊的概率也可以通过深层多层网络传播,以模拟更复杂对象的部分-整体关系。尽管这些网络具有很大的潜力,但作为原始计算元素,胶囊本身仍有很多需要理解的地方。在这封信中,我们研究了胶囊回归的问题,这是逻辑回归、概率回归和 softmax 回归的高维模拟,其中类概率是从竞争大小的向量中得出的。首先,我们提出了一种用于多项分类的简单胶囊架构:该架构为每个类别有一个胶囊,每个胶囊使用权重矩阵来计算其试图识别的模式的隐藏单元活动向量。接下来,我们展示了如何将这些隐藏单元活动表示为潜在变量,并使用挤压非线性将它们的大小作为向量转换为多项分类的归一化概率。当不同的胶囊竞争识别相同的模式时,挤压非线性在它们的潜在变量的后验分布中引入了非高斯项。然而,我们表明精确推理仍然是可行的,并使用期望最大化程序为每个胶囊的权重矩阵推导最小二乘更新。我们还提出了实验结果,以展示这些想法在实践中的工作方式。