Khan Salman H, Hayat Munawar, Bennamoun Mohammed, Togneri Roberto, Sohel Ferdous A
IEEE Trans Image Process. 2016 Jul;25(7):3372-3383. doi: 10.1109/TIP.2016.2567076. Epub 2016 May 11.
Indoor scene recognition is a multi-faceted and challenging problem due to the diverse intra-class variations and the confusing inter-class similarities that characterize such scenes. This paper presents a novel approach that exploits rich mid-level convolutional features to categorize indoor scenes. Traditional convolutional features retain the global spatial structure, which is a desirable property for general object recognition. We, however, argue that the structure-preserving property of the convolutional neural network activations is not of substantial help in the presence of large variations in scene layouts, e.g., in indoor scenes. We propose to transform the structured convolutional activations to another highly discriminative feature space. The representation in the transformed space not only incorporates the discriminative aspects of the target data set but also encodes the features in terms of the general object categories that are present in indoor scenes. To this end, we introduce a new large-scale data set of 1300 object categories that are commonly present in indoor scenes. Our proposed approach achieves a significant performance boost over the previous state-of-the-art approaches on five major scene classification data sets.
室内场景识别是一个多方面且具有挑战性的问题,因为此类场景具有多样的类内变化和令人困惑的类间相似性。本文提出了一种利用丰富的中级卷积特征对室内场景进行分类的新方法。传统的卷积特征保留了全局空间结构,这对于一般目标识别来说是一个理想的属性。然而,我们认为,在场景布局存在较大变化的情况下,例如在室内场景中,卷积神经网络激活的结构保留属性并没有太大帮助。我们建议将结构化的卷积激活转换到另一个高度有区分性的特征空间。在转换后的空间中的表示不仅包含目标数据集的区分性方面,还根据室内场景中存在的一般物体类别对特征进行编码。为此,我们引入了一个包含1300个室内场景中常见物体类别的新大规模数据集。我们提出的方法在五个主要场景分类数据集上比之前的最先进方法有显著的性能提升。