School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, Shannxi, China; School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, Shannxi, China.
School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi'an, 710072, Shannxi, China; Key Laboratory of Intelligent Interaction and Application (Northwestern Polytechnical University), Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, 710072, Shannxi, China.
Neural Netw. 2024 Dec;180:106673. doi: 10.1016/j.neunet.2024.106673. Epub 2024 Aug 30.
Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global-local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global-local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at https://github.com/guanguanboy/SHT-Net.
图像调和旨在将复合图像中背景的光照分布转移到前景的光照分布。现有的方法缺乏在复合图像的前景和背景之间建立全局-局部像素光照依赖关系的能力,这对于生成清晰和颜色一致的调和图像是必不可少的。为了克服这一挑战,我们设计了一种新颖的简单混合 CNN-Transformer 网络(SHT-Net),它被构建成一个高效的对称分层架构。它由两个新设计的轻量级 Transformer 块组成。首先,设计了尺度感知门控块,通过不同的头捕获多尺度特征,并扩展感受野,从而有利于生成具有细粒度细节的图像。其次,我们引入了一个简单的并行注意力块,它将基于窗口的自注意力和门控通道注意力并行集成,从而同时具有全局-局部像素光照关系建模能力。此外,我们提出了一种有效的简单前馈网络,用于过滤掉信息量较少的特征,并允许特征通过传递生成逼真的调和结果。在图像调和基准上的广泛实验表明,我们的方法在定量和定性方面都取得了有希望的结果。代码和预训练模型可在 https://github.com/guanguanboy/SHT-Net 上获得。