• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

视觉Transformer 在图像恢复中的应用综述

Vision Transformers in Image Restoration: A Survey.

机构信息

Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia.

Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt.

出版信息

Sensors (Basel). 2023 Feb 21;23(5):2385. doi: 10.3390/s23052385.

DOI:10.3390/s23052385
PMID:36904589
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10006889/
Abstract

The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.

摘要

视觉转换器(Vision Transformer,ViT)架构在图像恢复方面取得了显著的成功。在一段时间内,卷积神经网络(Convolutional Neural Network,CNN)在大多数计算机视觉任务中占据主导地位。现在,CNN 和 ViT 都是高效的方法,它们展示了强大的能力,可以在低质量格式的图像中恢复更好的版本。在这项研究中,广泛研究了 ViT 在图像恢复中的效率。对 ViT 架构进行了分类,用于每一项图像恢复任务。考虑了七种图像恢复任务:图像超分辨率、图像去噪、一般图像增强、JPEG 压缩伪影减少、图像模糊去除、去除不利天气条件和图像去雾。详细介绍了结果、优点、局限性以及未来研究的可能领域。总的来说,将 ViT 纳入新的图像恢复架构中正在成为一种规则。这是因为与 CNN 相比,它具有一些优势,例如更好的效率,尤其是当向网络提供更多数据时,在特征提取方面的稳健性,以及更好的特征学习方法,可以更好地看到输入的变化和特征。然而,也存在一些缺点,例如需要更多的数据来展示 ViT 相对于 CNN 的优势,由于自注意力块的复杂性而增加的计算成本,更具挑战性的训练过程,以及缺乏可解释性。这些缺点代表了未来研究的方向,应针对这些缺点来提高 ViT 在图像恢复领域的效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/39de332cbbac/sensors-23-02385-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/ac20deb6ac35/sensors-23-02385-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/b18290ba3d18/sensors-23-02385-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/aff760833dc1/sensors-23-02385-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/de46534ea6b7/sensors-23-02385-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/43a2e40a74c8/sensors-23-02385-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/b6855a50d200/sensors-23-02385-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/8f8b85d3ee5e/sensors-23-02385-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/3857537ee657/sensors-23-02385-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/85afecb1e435/sensors-23-02385-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/c5ab84ef39e6/sensors-23-02385-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/39de332cbbac/sensors-23-02385-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/ac20deb6ac35/sensors-23-02385-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/b18290ba3d18/sensors-23-02385-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/aff760833dc1/sensors-23-02385-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/de46534ea6b7/sensors-23-02385-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/43a2e40a74c8/sensors-23-02385-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/b6855a50d200/sensors-23-02385-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/8f8b85d3ee5e/sensors-23-02385-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/3857537ee657/sensors-23-02385-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/85afecb1e435/sensors-23-02385-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/c5ab84ef39e6/sensors-23-02385-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6121/10006889/39de332cbbac/sensors-23-02385-g011.jpg

相似文献

1
Vision Transformers in Image Restoration: A Survey.视觉Transformer 在图像恢复中的应用综述
Sensors (Basel). 2023 Feb 21;23(5):2385. doi: 10.3390/s23052385.
2
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
3
Pure Vision Transformer (CT-ViT) with Noise2Neighbors Interpolation for Low-Dose CT Image Denoising.基于 Noise2Neighbors 插值的纯 Vision Transformer(CT-ViT)用于低剂量 CT 图像降噪。
J Imaging Inform Med. 2024 Oct;37(5):2669-2687. doi: 10.1007/s10278-024-01108-8. Epub 2024 Apr 15.
4
An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.基于Transformer 和卷积神经网络融合的高效去雾算法。
Sensors (Basel). 2022 Dec 21;23(1):43. doi: 10.3390/s23010043.
5
MuSiC-ViT: A multi-task Siamese convolutional vision transformer for differentiating change from no-change in follow-up chest radiographs.MuSiC-ViT:一种用于区分随访胸部 X 光片上变化与无变化的多任务暹罗卷积视觉Transformer。
Med Image Anal. 2023 Oct;89:102894. doi: 10.1016/j.media.2023.102894. Epub 2023 Jul 12.
6
DistilIQA: Distilling Vision Transformers for no-reference perceptual CT image quality assessment.蒸馏 IQA:用于无参考感知 CT 图像质量评估的视觉 Transformer 蒸馏。
Comput Biol Med. 2024 Jul;177:108670. doi: 10.1016/j.compbiomed.2024.108670. Epub 2024 May 28.
7
EfficientUNetViT: Efficient Breast Tumor Segmentation Utilizing UNet Architecture and Pretrained Vision Transformer.高效UNetViT:利用UNet架构和预训练视觉Transformer进行高效乳腺肿瘤分割
Bioengineering (Basel). 2024 Sep 21;11(9):945. doi: 10.3390/bioengineering11090945.
8
Transformer-based progressive residual network for single image dehazing.基于Transformer的单图像去雾渐进式残差网络。
Front Neurorobot. 2022 Dec 6;16:1084543. doi: 10.3389/fnbot.2022.1084543. eCollection 2022.
9
Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer.使用视觉Transformer和Swin Transformer对基于移动设备的口腔癌图像进行分类
Cancers (Basel). 2024 Feb 29;16(5):987. doi: 10.3390/cancers16050987.
10
Ultrasound Image Analysis with Vision Transformers-Review.基于视觉Transformer的超声图像分析——综述
Diagnostics (Basel). 2024 Mar 4;14(5):542. doi: 10.3390/diagnostics14050542.

引用本文的文献

1
Performance of deep learning models for the classification and object detection of different oral white lesions using photographic images.使用摄影图像的深度学习模型对不同口腔白色病变进行分类和目标检测的性能
Sci Rep. 2025 Aug 22;15(1):30834. doi: 10.1038/s41598-025-14450-w.
2
Imaging Transformer for MRI Denoising: a Scalable Model Architecture that enables ≪ 1 Imaging.用于MRI去噪的成像变压器:一种可扩展的模型架构,实现少于1次成像。
ArXiv. 2025 Apr 13:arXiv:2504.10534v1.
3
Binary and Multi-Class Classification of Colorectal Polyps Using CRP-ViT: A Comparative Study Between CNNs and QNNs.

本文引用的文献

1
Transformers in medical imaging: A survey.医学成像中的变压器:综述。
Med Image Anal. 2023 Aug;88:102802. doi: 10.1016/j.media.2023.102802. Epub 2023 Apr 5.
2
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution.HIPA:用于单图像超分辨率的分层补丁转换器。
IEEE Trans Image Process. 2023;32:3226-3237. doi: 10.1109/TIP.2023.3279977. Epub 2023 Jun 6.
3
Vision Transformers for Single Image Dehazing.用于单图像去雾的视觉Transformer
使用CRP-ViT的结直肠息肉二元和多类分类:CNN与QNN的比较研究
Life (Basel). 2025 Jul 17;15(7):1124. doi: 10.3390/life15071124.
4
Development and Validation of a Multi-Task Artificial Intelligence-Assisted System for Small Bowel Capsule Endoscopy.用于小肠胶囊内镜检查的多任务人工智能辅助系统的开发与验证
Int J Gen Med. 2025 May 12;18:2521-2536. doi: 10.2147/IJGM.S522587. eCollection 2025.
5
Hybrid Deep Learning Framework for Continuous User Authentication Based on Smartphone Sensors.基于智能手机传感器的连续用户认证混合深度学习框架
Sensors (Basel). 2025 Apr 30;25(9):2817. doi: 10.3390/s25092817.
6
An enhanced image restoration using deep learning and transformer based contextual optimization algorithm.一种使用深度学习和基于Transformer的上下文优化算法的增强图像恢复方法。
Sci Rep. 2025 Mar 25;15(1):10324. doi: 10.1038/s41598-025-94449-5.
7
Non-small cell lung cancer detection through knowledge distillation approach with teaching assistant.通过带有助教的知识蒸馏方法进行非小细胞肺癌检测。
PLoS One. 2024 Nov 6;19(11):e0306441. doi: 10.1371/journal.pone.0306441. eCollection 2024.
8
A joint learning framework for multisite CBCT-to-CT translation using a hybrid CNN-transformer synthesizer and a registration network.一种使用混合卷积神经网络-Transformer合成器和配准网络的多站点CBCT到CT转换的联合学习框架。
Front Oncol. 2024 Aug 8;14:1440944. doi: 10.3389/fonc.2024.1440944. eCollection 2024.
9
Multi-Branch Network for Color Image Denoising Using Dilated Convolution and Attention Mechanisms.基于空洞卷积和注意力机制的彩色图像去噪多分支网络
Sensors (Basel). 2024 Jun 3;24(11):3608. doi: 10.3390/s24113608.
10
GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model.基于 GNViT 的图像增强型花生病虫害分类模型,该模型使用了 Vision Transformer(ViT)模型。
PLoS One. 2024 Mar 25;19(3):e0301174. doi: 10.1371/journal.pone.0301174. eCollection 2024.
IEEE Trans Image Process. 2023;32:1927-1941. doi: 10.1109/TIP.2023.3256763. Epub 2023 Mar 24.
4
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution.CTCNet:一种用于人脸图像超分辨率的 CNN-Transformer 合作网络。
IEEE Trans Image Process. 2023;32:1978-1991. doi: 10.1109/TIP.2023.3261747.
5
RFormer: Transformer-Based Generative Adversarial Network for Real Fundus Image Restoration on a New Clinical Benchmark.RFormer:基于 Transformer 的生成对抗网络,用于新临床基准上的真实眼底图像恢复。
IEEE J Biomed Health Inform. 2022 Sep;26(9):4645-4655. doi: 10.1109/JBHI.2022.3187103. Epub 2022 Sep 9.
6
Hybrid and Deep Learning Approach for Early Diagnosis of Lower Gastrointestinal Diseases.混合与深度学习方法在胃肠道疾病早期诊断中的应用
Sensors (Basel). 2022 May 27;22(11):4079. doi: 10.3390/s22114079.
7
A Novel Transformer-Based Attention Network for Image Dehazing.基于新型Transformer 的注意力网络图像去雾
Sensors (Basel). 2022 Apr 30;22(9):3428. doi: 10.3390/s22093428.
8
A Survey on Vision Transformer.视觉Transformer综述
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):87-110. doi: 10.1109/TPAMI.2022.3152247. Epub 2022 Dec 5.
9
A Machine Learning Approach Involving Functional Connectivity Features to Classify Rest-EEG Psychogenic Non-Epileptic Seizures from Healthy Controls.一种基于功能连接特征的机器学习方法,用于对静息态 EEG 心因性非癫痫性发作与健康对照进行分类。
Sensors (Basel). 2021 Dec 25;22(1):129. doi: 10.3390/s22010129.
10
Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors.基于语义和深度先验的深度密集多尺度除雪网络
IEEE Trans Image Process. 2021;30:7419-7431. doi: 10.1109/TIP.2021.3104166. Epub 2021 Aug 30.