Kim Seungryong, Min Dongbo, Ham Bumsub, Lin Stephen, Sohn Kwanghoon
IEEE Trans Pattern Anal Mach Intell. 2019 Mar;41(3):581-595. doi: 10.1109/TPAMI.2018.2803169. Epub 2018 Feb 7.
We present a descriptor, called fully convolutional self-similarity (FCSS), for dense semantic correspondence. Unlike traditional dense correspondence approaches for estimating depth or optical flow, semantic correspondence estimation poses additional challenges due to intra-class appearance and shape variations among different instances within the same object or scene category. To robustly match points across semantically similar images, we formulate FCSS using local self-similarity (LSS), which is inherently insensitive to intra-class appearance variations. LSS is incorporated through a proposed convolutional self-similarity (CSS) layer, where the sampling patterns and the self-similarity measure are jointly learned in an end-to-end and multi-scale manner. Furthermore, to address shape variations among different object instances, we propose a convolutional affine transformer (CAT) layer that estimates explicit affine transformation fields at each pixel to transform the sampling patterns and corresponding receptive fields. As training data for semantic correspondence is rather limited, we propose to leverage object candidate priors provided in most existing datasets and also correspondence consistency between object pairs to enable weakly-supervised learning. Experiments demonstrate that FCSS significantly outperforms conventional handcrafted descriptors and CNN-based descriptors on various benchmarks.
我们提出了一种用于密集语义对应关系的描述符,称为全卷积自相似性(FCSS)。与用于估计深度或光流的传统密集对应方法不同,语义对应估计由于同一对象或场景类别中不同实例之间的类内外观和形状变化而带来了额外的挑战。为了在语义相似的图像之间稳健地匹配点,我们使用局部自相似性(LSS)来制定FCSS,它本质上对类内外观变化不敏感。通过提出的卷积自相似性(CSS)层纳入LSS,其中采样模式和自相似性度量以端到端和多尺度的方式联合学习。此外,为了解决不同对象实例之间的形状变化,我们提出了一种卷积仿射变换(CAT)层,该层在每个像素处估计显式仿射变换场,以变换采样模式和相应的感受野。由于语义对应关系的训练数据相当有限,我们建议利用大多数现有数据集中提供的对象候选先验以及对象对之间的对应一致性来实现弱监督学习。实验表明,在各种基准测试中,FCSS显著优于传统的手工描述符和基于CNN的描述符。