IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3768-3782. doi: 10.1109/TPAMI.2022.3181587. Epub 2023 Feb 3.
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map. A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic. Recent work on learning cross-domain correspondence has shown promising results for global layout transfer with dense attention-based warping. However, this method tends to lose texture details due to the resolution limitation and the lack of smoothness constraint on correspondence. To adapt this paradigm for the layout manipulation task, we propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512. To further improve visual quality, we introduce a novel generator architecture consisting of a semantic encoder and a two-stage decoder for coarse-to-fine synthesis. Experiments on the ADE20k and Places365 datasets demonstrate that our proposed approach achieves substantial improvements over the existing inpainting and layout manipulation methods.
我们解决了语义图像布局操作的问题,该问题旨在通过编辑输入图像的语义标签图来操作图像。该任务的一个核心问题是如何在使生成的图像具有真实感的同时,将视觉细节从输入图像转移到新的语义布局中。最近在学习跨域对应关系方面的工作表明,基于密集注意力的变形在全局布局转移方面具有很有前景的结果。然而,由于分辨率限制和对应关系缺乏平滑性约束,这种方法往往会丢失纹理细节。为了将这种范式应用于布局操作任务,我们提出了一种高分辨率稀疏注意力模块,该模块可以在高达 512x512 的分辨率下有效地将视觉细节转移到新的布局中。为了进一步提高视觉质量,我们引入了一种新颖的生成器架构,该架构由语义编码器和两级解码器组成,用于从粗到细的合成。在 ADE20k 和 Places365 数据集上的实验表明,我们提出的方法在现有的修复和布局操作方法上取得了实质性的改进。