Lee Junghyup, Kim Dohyung, Lee Wonkyung, Ponce Jean, Ham Bumsub
IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1399-1414. doi: 10.1109/TPAMI.2020.3013620. Epub 2022 Feb 3.
We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks and subjected to synthetic geometric deformations to train a convolutional neural network (CNN) for this task. Using these masks as part of the supervisory signal provides an object-level prior for the semantic correspondence task and offers a good compromise between semantic flow methods, where the amount of training data is limited by the cost of manually selecting point correspondences, and semantic alignment ones, where the regression of a single global geometric transformation between images may be sensitive to image-specific details such as background clutter. We propose a new CNN architecture, dubbed SFNet, which implements this idea. It leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms. Experimental results demonstrate the effectiveness of our approach, which significantly outperforms the state of the art on standard benchmarks.
我们解决语义对应问题,即,在描绘同一物体或场景类别不同实例的图像之间建立密集流场。我们建议使用带有二进制前景掩码并经过合成几何变形的图像来训练卷积神经网络(CNN)以完成此任务。将这些掩码用作监督信号的一部分为语义对应任务提供了对象级先验,并在语义流方法和语义对齐方法之间提供了良好的折衷。在语义流方法中,训练数据量受手动选择点对应成本的限制;在语义对齐方法中,图像之间单个全局几何变换的回归可能对诸如背景杂波等特定于图像的细节敏感。我们提出了一种新的CNN架构,称为SFNet,它实现了这一想法。它利用argmax函数的一种新的可微版本进行端到端训练,其损失将掩码和流一致性与平滑项相结合。实验结果证明了我们方法的有效性,该方法在标准基准上显著优于现有技术。