Suppr超能文献

深度 Fisher 网络图像分类

Deep FisherNet for Image Classification.

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Jul;30(7):2244-2250. doi: 10.1109/TNNLS.2018.2874657. Epub 2018 Nov 5.

Abstract

Despite the great success of convolutional neural networks (CNNs) for the image classification task on data sets such as Cifar and ImageNet, CNN's representation power is still somewhat limited in dealing with images that have a large variation in size and clutter, where Fisher vector (FV) has shown to be an effective encoding strategy. FV encodes an image by aggregating local descriptors with a universal generative Gaussian mixture model (GMM). FV, however, has limited learning capability and its parameters are mostly fixed after constructing the codebook. To combine together the best of the two worlds, we propose in this brief a neural network structure with FV layer being part of an end-to-end trainable system that is differentiable; we name our network FisherNet that is learnable using back propagation. Our proposed FisherNet combines CNN training and FV encoding in a single end-to-end structure. We observe a clear advantage of FisherNet over plain CNN and standard FV in terms of both classification accuracy and computational efficiency on the challenging PASCAL visual object classes object classification and emotion image classification tasks.

摘要

尽管卷积神经网络(CNN)在 Cifar 和 ImageNet 等数据集的图像分类任务中取得了巨大成功,但在处理大小和杂乱程度变化较大的图像时,CNN 的表示能力仍然有些有限,而 Fisher 向量(FV)已被证明是一种有效的编码策略。FV 通过用通用生成高斯混合模型(GMM)聚合局部描述符对图像进行编码。然而,FV 的学习能力有限,并且其参数在构建代码本后大多固定。为了结合两者的优点,我们在本简讯中提出了一种具有 FV 层的神经网络结构,该结构是一个端到端可训练的、可微分的系统;我们将我们的网络命名为 FisherNet,它可以使用反向传播进行学习。我们提出的 FisherNet 将 CNN 训练和 FV 编码结合在一个端到端的结构中。我们观察到,在具有挑战性的 PASCAL 视觉对象类别对象分类和情感图像分类任务中,FisherNet 在分类准确性和计算效率方面明显优于纯 CNN 和标准 FV。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验