IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2534-2547. doi: 10.1109/TPAMI.2020.3036543. Epub 2022 Apr 1.
Generative adversarial networks have achieved great success in unpaired image-to-image translation. Cycle consistency, a key component for this task, allows modeling the relationship between two distinct domains without paired data. In this paper, we propose an alternative framework, as an extension of latent space interpolation, to consider the intermediate region between two domains during translation. It is based on the assumption that in a flat and smooth latent space, there exist many paths that connect two sample points. Properly selecting paths makes it possible to change only certain image attributes, which is useful for generating intermediate images between the two domains. With this idea, our framework includes an encoder, an interpolator and a decoder. The encoder maps natural images to a convex and smooth latent space where interpolation is applicable. The interpolator controls the interpolation path so that desired intermediate samples can be obtained. Finally, the decoder inverts interpolated features back to pixel space. We also show that by choosing different reference images and interpolation paths, this framework can be applied to multi-domain and multi-modal translation. Extensive experiments manifest that our framework achieves superior results and is flexible for various tasks.
生成对抗网络在非配对图像到图像的翻译中取得了巨大的成功。循环一致性是这项任务的关键组成部分,它允许在没有配对数据的情况下对两个不同领域之间的关系进行建模。在本文中,我们提出了一种替代框架,作为潜在空间插值的扩展,在翻译过程中考虑两个域之间的中间区域。其假设是在一个平坦且平滑的潜在空间中,存在许多连接两个样本点的路径。正确选择路径使得只改变某些图像属性成为可能,这对于生成两个域之间的中间图像非常有用。基于这一思想,我们的框架包括编码器、插值器和解码器。编码器将自然图像映射到一个凸的和平滑的潜在空间,在这个空间中可以进行插值。插值器控制插值路径,以便获得所需的中间样本。最后,解码器将插值特征反向映射到像素空间。我们还表明,通过选择不同的参考图像和插值路径,这个框架可以应用于多域和多模态的翻译。大量的实验表明,我们的框架取得了优越的结果,并且对于各种任务具有灵活性。