IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2335-2348. doi: 10.1109/TPAMI.2017.2651061. Epub 2017 Jan 10.
Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) as the generative model for local features. However, the representative power of a GMM can be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes, and the number of prototypes is usually small in FVC. To alleviate this limitation, in this work, we break the convention which assumes that a local feature is drawn from one of a few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as a linear combination of multiple key components, and the combination weight is a latent random variable. In doing so we greatly enhance the representative power of the generative model underlying FVC. To implement our idea, we design two particular generative models following this compositional approach. In our first model, the mean vector is sampled from the subspace spanned by a set of bases and the combination weight is drawn from a Laplace distribution. In our second model, we further assume that a local feature is composed of a discriminative part and a residual part. As a result, a local feature is generated by the linear combination of discriminative part bases and residual part bases. The decomposition of the discriminative and residual parts is achieved via the guidance of a pre-trained supervised coding method. By calculating the gradient vector of the proposed models, we derive two new Fisher vector coding strategies. The first is termed Sparse Coding-based Fisher Vector Coding (SCFVC) and can be used as the substitute of traditional GMM based FVC. The second is termed Hybrid Sparse Coding-based Fisher vector coding (HSCFVC) since it combines the merits of both pre-trained supervised coding methods and FVC. Using pre-trained Convolutional Neural Network (CNN) activations as local features, we experimentally demonstrate that the proposed methods are superior to traditional GMM based FVC and achieve state-of-the-art performance in various image classification tasks.
从局部特征生成模型的梯度向量导出的 Fisher 向量编码(FVC)已被确定为图像分类的有效编码方法。如果不是全部,那么大多数 FVC 实现都将高斯混合模型(GMM)用作局部特征的生成模型。然而,GMM 的代表性可能受到限制,因为它本质上假定局部特征可以用固定数量的特征原型来描述,而在 FVC 中,原型的数量通常很小。为了缓解这一限制,在这项工作中,我们打破了假设局部特征是从几个高斯分布之一中抽取的传统观念。相反,我们采用了一种组合机制,假设局部特征是从一个高斯分布中抽取的,该高斯分布的均值向量是由多个关键分量的线性组合构成的,而组合权重是一个潜在的随机变量。通过这样做,我们大大增强了 FVC 底层生成模型的代表性。为了实现我们的想法,我们设计了两种特定的生成模型,它们遵循这种组合方法。在我们的第一个模型中,均值向量是从一组基向量张成的子空间中抽取的,而组合权重是从拉普拉斯分布中抽取的。在我们的第二个模型中,我们进一步假设局部特征由判别部分和残差部分组成。因此,局部特征是通过判别部分基向量和残差部分基向量的线性组合生成的。判别部分和残差部分的分解是通过预训练的监督编码方法的指导来实现的。通过计算所提出模型的梯度向量,我们推导出两种新的 Fisher 向量编码策略。第一种称为基于稀疏编码的 Fisher 向量编码(SCFVC),可以作为传统基于 GMM 的 FVC 的替代品。第二种称为混合基于稀疏编码的 Fisher 向量编码(HSCFVC),因为它结合了预训练的监督编码方法和 FVC 的优点。使用预训练的卷积神经网络(CNN)激活作为局部特征,我们通过实验证明,所提出的方法优于传统的基于 GMM 的 FVC,并在各种图像分类任务中达到了最先进的性能。