用于单图像去雾的视觉Transformer

Vision Transformers for Single Image Dehazing.

作者信息

Song Yuda, He Zhuqing, Qian Hui, Du Xin

出版信息

IEEE Trans Image Process. 2023;32:1927-1941. doi: 10.1109/TIP.2023.3256763. Epub 2023 Mar 24.

DOI:10.1109/TIP.2023.3256763

Abstract

Image dehazing is a representative low-level vision task that estimates latent haze-free images from hazy images. In recent years, convolutional neural network-based methods have dominated image dehazing. However, vision Transformers, which has recently made a breakthrough in high-level vision tasks, has not brought new dimensions to image dehazing. We start with the popular Swin Transformer and find that several of its key designs are unsuitable for image dehazing. To this end, we propose DehazeFormer, which consists of various improvements, such as the modified normalization layer, activation function, and spatial information aggregation scheme. We train multiple variants of DehazeFormer on various datasets to demonstrate its effectiveness. Specifically, on the most frequently used SOTS indoor set, our small model outperforms FFA-Net with only 25% #Param and 5% computational cost. To the best of our knowledge, our large model is the first method with the PSNR over 40 dB on the SOTS indoor set, dramatically outperforming the previous state-of-the-art methods. We also collect a large-scale realistic remote sensing dehazing dataset for evaluating the method's capability to remove highly non-homogeneous haze. We share our code and dataset at https://github.com/IDKiro/DehazeFormer.

摘要

图像去雾是一项具有代表性的底层视觉任务，它能从模糊图像中估计出潜在的无雾图像。近年来，基于卷积神经网络的方法在图像去雾领域占据主导地位。然而，最近在高层视觉任务中取得突破的视觉Transformer，在图像去雾方面并未带来新的进展。我们从流行的Swin Transformer入手，发现其几个关键设计并不适用于图像去雾。为此，我们提出了DehazeFormer，它包含了各种改进，如改进的归一化层、激活函数和空间信息聚合方案。我们在各种数据集上训练了DehazeFormer的多个变体，以证明其有效性。具体而言，在最常用的SOTS室内数据集上，我们的小模型仅用25%的参数和5%的计算成本就超越了FFA-Net。据我们所知，我们的大模型是第一种在SOTS室内数据集上PSNR超过40 dB的方法，显著优于之前的最先进方法。我们还收集了一个大规模的真实遥感去雾数据集，用于评估该方法去除高度不均匀雾霭的能力。我们在https://github.com/IDKiro/DehazeFormer上分享了我们的代码和数据集。