Suppr超能文献

视觉Transformer 在图像恢复中的应用综述

Vision Transformers in Image Restoration: A Survey.

机构信息

Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia.

Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt.

出版信息

Sensors (Basel). 2023 Feb 21;23(5):2385. doi: 10.3390/s23052385.

Abstract

The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.

摘要

视觉转换器(Vision Transformer,ViT)架构在图像恢复方面取得了显著的成功。在一段时间内,卷积神经网络(Convolutional Neural Network,CNN)在大多数计算机视觉任务中占据主导地位。现在,CNN 和 ViT 都是高效的方法,它们展示了强大的能力,可以在低质量格式的图像中恢复更好的版本。在这项研究中,广泛研究了 ViT 在图像恢复中的效率。对 ViT 架构进行了分类,用于每一项图像恢复任务。考虑了七种图像恢复任务:图像超分辨率、图像去噪、一般图像增强、JPEG 压缩伪影减少、图像模糊去除、去除不利天气条件和图像去雾。详细介绍了结果、优点、局限性以及未来研究的可能领域。总的来说,将 ViT 纳入新的图像恢复架构中正在成为一种规则。这是因为与 CNN 相比,它具有一些优势,例如更好的效率,尤其是当向网络提供更多数据时,在特征提取方面的稳健性,以及更好的特征学习方法,可以更好地看到输入的变化和特征。然而,也存在一些缺点,例如需要更多的数据来展示 ViT 相对于 CNN 的优势,由于自注意力块的复杂性而增加的计算成本,更具挑战性的训练过程,以及缺乏可解释性。这些缺点代表了未来研究的方向,应针对这些缺点来提高 ViT 在图像恢复领域的效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/ac20deb6ac35/sensors-23-02385-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验