通过测量图像的等变性和等效性来理解图像表示

Understanding Image Representations by Measuring Their Equivariance and Equivalence.

作者信息

Lenc Karel, Vedaldi Andrea

机构信息

Department of Engineering Science, University of Oxford, Oxford, UK.

出版信息

Int J Comput Vis. 2019;127(5):456-476. doi: 10.1007/s11263-018-1098-y. Epub 2018 May 18.

DOI:10.1007/s11263-018-1098-y

PMID:31148885

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6510825/

Abstract

Despite the importance of image representations such as histograms of oriented gradients and deep Convolutional Neural Networks (CNN), our theoretical understanding of them remains limited. Aimed at filling this gap, we investigate two key mathematical properties of representations: equivariance and equivalence. Equivariance studies how transformations of the input image are encoded by the representation, invariance being a special case where a transformation has no effect. Equivalence studies whether two representations, for example two different parameterizations of a CNN, two different layers, or two different CNN architectures, share the same visual information or not. A number of methods to establish these properties empirically are proposed, including introducing transformation and stitching layers in CNNs. These methods are then applied to popular representations to reveal insightful aspects of their structure, including clarifying at which layers in a CNN certain geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility, including the spatial resolution of the representation and the complexity and depth of the models. While the focus of the paper is theoretical, direct applications to structured-output regression are demonstrated too.

摘要

尽管诸如方向梯度直方图和深度卷积神经网络（CNN）等图像表示方法很重要，但我们对它们的理论理解仍然有限。为了填补这一空白，我们研究了表示方法的两个关键数学特性：等变性和等价性。等变性研究输入图像的变换如何由表示方法进行编码，不变性是变换没有影响的一种特殊情况。等价性研究两个表示方法，例如CNN的两种不同参数化、两个不同层或两种不同的CNN架构，是否共享相同的视觉信息。我们提出了一些通过实证建立这些特性的方法，包括在CNN中引入变换层和拼接层。然后将这些方法应用于流行的表示方法，以揭示其结构中具有洞察力的方面，包括阐明在CNN的哪些层实现了特定的几何不变性以及各种CNN架构有何不同。我们确定了几个几何和架构兼容性的预测因素，包括表示方法的空间分辨率以及模型的复杂性和深度。虽然本文的重点是理论性的，但也展示了其在结构化输出回归中的直接应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0e4/6510825/976476a19915/11263_2018_1098_Fig1_HTML.jpg

相似文献

Understanding Image Representations by Measuring Their Equivariance and Equivalence.

Int J Comput Vis. 2019;127(5):456-476. doi: 10.1007/s11263-018-1098-y. Epub 2018 May 18.

Roto-translation equivariant convolutional networks: Application to histopathology image analysis.

Med Image Anal. 2021 Feb;68:101849. doi: 10.1016/j.media.2020.101849. Epub 2020 Oct 31.

Learning Generalized Transformation Equivariant Representations Via AutoEncoding Transformations.

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2045-2057. doi: 10.1109/TPAMI.2020.3029801. Epub 2022 Mar 4.

SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification.

J Imaging. 2022 Sep 21;8(10):256. doi: 10.3390/jimaging8100256.

Ear Detection Using Convolutional Neural Network on Graphs with Filter Rotation.

Sensors (Basel). 2019 Dec 13;19(24):5510. doi: 10.3390/s19245510.

Study on Representation Invariances of CNNs and Human Visual Information Processing Based on Data Augmentation.

Brain Sci. 2020 Sep 2;10(9):602. doi: 10.3390/brainsci10090602.

Visual Features and Their Own Optical Flow.

Front Artif Intell. 2021 Dec 1;4:768516. doi: 10.3389/frai.2021.768516. eCollection 2021.

Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks.

J Neurosci. 2021 May 12;41(19):4234-4252. doi: 10.1523/JNEUROSCI.1993-20.2021. Epub 2021 Mar 31.

Interpretable CNNs for Object Classification.

IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3416-3431. doi: 10.1109/TPAMI.2020.2982882. Epub 2021 Sep 2.

Neural representations of the perception of handwritten digits and visual objects from a convolutional neural network compared to humans.

Hum Brain Mapp. 2023 Apr 1;44(5):2018-2038. doi: 10.1002/hbm.26189. Epub 2023 Jan 13.

引用本文的文献

Orientation-invariant autoencoders learn robust representations for shape profiling of cells and organelles.

Nat Commun. 2024 Feb 3;15(1):1022. doi: 10.1038/s41467-024-45362-4.

PARC: Physics-aware recurrent convolutional neural networks to assimilate meso scale reactive mechanics of energetic materials.

Sci Adv. 2023 Apr 28;9(17):eadd6868. doi: 10.1126/sciadv.add6868.

Revisiting Consistency for Semi-Supervised Semantic Segmentation.

Sensors (Basel). 2023 Jan 13;23(2):940. doi: 10.3390/s23020940.

Learning PDE to Model Self-Organization of Matter.

Entropy (Basel). 2022 Aug 9;24(8):1096. doi: 10.3390/e24081096.

A self-supervised domain-general learning framework for human ventral stream representation.

Nat Commun. 2022 Jan 25;13(1):491. doi: 10.1038/s41467-022-28091-4.

Self-supervised learning for using overhead imagery as maps in outdoor range sensor localization.

Int J Rob Res. 2021 Dec;40(12-14):1488-1509. doi: 10.1177/02783649211045736. Epub 2021 Sep 28.

Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold.

Sensors (Basel). 2021 Apr 2;21(7):2483. doi: 10.3390/s21072483.

Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments.

Neuron. 2021 Feb 17;109(4):724-738.e7. doi: 10.1016/j.neuron.2020.11.021. Epub 2020 Dec 15.

Study on Representation Invariances of CNNs and Human Visual Information Processing Based on Data Augmentation.

Brain Sci. 2020 Sep 2;10(9):602. doi: 10.3390/brainsci10090602.

Neurally plausible mechanisms for learning selective and invariant representations.

J Math Neurosci. 2020 Aug 18;10(1):12. doi: 10.1186/s13408-020-00088-7.

本文引用的文献

Object Detection Networks on Convolutional Feature Maps.

IEEE Trans Pattern Anal Mach Intell. 2017 Jul;39(7):1476-1481. doi: 10.1109/TPAMI.2016.2601099. Epub 2016 Aug 17.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

Fully Convolutional Networks for Semantic Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.

Invariant scattering convolution networks.

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1872-86. doi: 10.1109/TPAMI.2012.230.

Object detection with discriminatively trained part-based models.

IEEE Trans Pattern Anal Mach Intell. 2010 Sep;32(9):1627-45. doi: 10.1109/TPAMI.2009.167.

Performance evaluation of local descriptors.

IEEE Trans Pattern Anal Mach Intell. 2005 Oct;27(10):1615-30. doi: 10.1109/TPAMI.2005.188.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过测量图像的等变性和等效性来理解图像表示

Understanding Image Representations by Measuring Their Equivariance and Equivalence.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献