Lenc Karel, Vedaldi Andrea
Department of Engineering Science, University of Oxford, Oxford, UK.
Int J Comput Vis. 2019;127(5):456-476. doi: 10.1007/s11263-018-1098-y. Epub 2018 May 18.
Despite the importance of image representations such as histograms of oriented gradients and deep Convolutional Neural Networks (CNN), our theoretical understanding of them remains limited. Aimed at filling this gap, we investigate two key mathematical properties of representations: equivariance and equivalence. Equivariance studies how transformations of the input image are encoded by the representation, invariance being a special case where a transformation has no effect. Equivalence studies whether two representations, for example two different parameterizations of a CNN, two different layers, or two different CNN architectures, share the same visual information or not. A number of methods to establish these properties empirically are proposed, including introducing transformation and stitching layers in CNNs. These methods are then applied to popular representations to reveal insightful aspects of their structure, including clarifying at which layers in a CNN certain geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility, including the spatial resolution of the representation and the complexity and depth of the models. While the focus of the paper is theoretical, direct applications to structured-output regression are demonstrated too.
尽管诸如方向梯度直方图和深度卷积神经网络(CNN)等图像表示方法很重要,但我们对它们的理论理解仍然有限。为了填补这一空白,我们研究了表示方法的两个关键数学特性:等变性和等价性。等变性研究输入图像的变换如何由表示方法进行编码,不变性是变换没有影响的一种特殊情况。等价性研究两个表示方法,例如CNN的两种不同参数化、两个不同层或两种不同的CNN架构,是否共享相同的视觉信息。我们提出了一些通过实证建立这些特性的方法,包括在CNN中引入变换层和拼接层。然后将这些方法应用于流行的表示方法,以揭示其结构中具有洞察力的方面,包括阐明在CNN的哪些层实现了特定的几何不变性以及各种CNN架构有何不同。我们确定了几个几何和架构兼容性的预测因素,包括表示方法的空间分辨率以及模型的复杂性和深度。虽然本文的重点是理论性的,但也展示了其在结构化输出回归中的直接应用。