IEEE Trans Image Process. 2017 Sep;26(9):4102-4113. doi: 10.1109/TIP.2017.2710631. Epub 2017 Jun 9.
Automatic recognition of an image's style is important for many applications, including artwork analysis, photo organization, and image retrieval. Traditional convolution neural network (CNN) approach uses only object features for image style recognition. This approach may not be optimal, because the same object in two images may have different styles. We propose a CNN architecture with two pathways extracting object features and texture features, respectively. The object pathway represents the standard CNN architecture and the texture pathway intermixes the object pathway by outputting the gram matrices of intermediate features in the object pathway. The two pathways are jointly trained. In experiments, two deep CNNs, AlexNet and VGG-19, pretrained on the ImageNet classification data set are fine-tuned for this task. For any model, the two-pathway architecture performs much better than individual pathways, which indicates that the two pathways contain complementary information of an image's style. In particular, the model based on VGG-19 achieves the state-of-the-art results on three benchmark data sets, WikiPaintings, Flickr Style, and AVA Style.
图像风格的自动识别对于许多应用非常重要,包括艺术品分析、照片组织和图像检索。传统的卷积神经网络 (CNN) 方法仅使用对象特征进行图像风格识别。这种方法可能不是最优的,因为两张图像中的相同对象可能具有不同的风格。我们提出了一种具有两条路径的 CNN 架构,分别提取对象特征和纹理特征。对象路径表示标准的 CNN 架构,纹理路径通过输出对象路径中的中间特征的 Gram 矩阵来混合对象路径。两条路径共同训练。在实验中,我们对两个深度卷积神经网络 AlexNet 和 VGG-19 进行了微调,这些网络都是在 ImageNet 分类数据集上进行预训练的。对于任何模型,双通道架构的性能都明显优于单个路径,这表明两条路径包含了图像风格的互补信息。特别是,基于 VGG-19 的模型在三个基准数据集(WikiPaintings、Flickr Style 和 AVA Style)上实现了最先进的结果。