Ho Quoc-Thien, Duong Minh-Thien, Lee Seongsoo, Hong Min-Cheol
Department of Information and Telecommunication Engineering, Soongsil University, Seoul 06978, Republic of Korea.
Department of Automatic Control, Ho Chi Minh City University of Technology and Education, Ho Chi Minh City 70000, Vietnam.
Sensors (Basel). 2024 Oct 10;24(20):6545. doi: 10.3390/s24206545.
The motion of an object or camera platform makes the acquired image blurred. This degradation is a major reason to obtain a poor-quality image from an imaging sensor. Therefore, developing an efficient deep-learning-based image processing method to remove the blur artifact is desirable. Deep learning has recently demonstrated significant efficacy in image deblurring, primarily through convolutional neural networks (CNNs) and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range structural dependencies. In contrast, Transformers excel at modeling these dependencies, but they are computationally expensive for high-resolution inputs and lack the appropriate inductive bias. To overcome these challenges, we propose an Efficient Hybrid Network (EHNet) that employs CNN encoders for local feature extraction and Transformer decoders with a dual-attention module to capture spatial and channel-wise dependencies. This synergy facilitates the acquisition of rich contextual information for high-quality image deblurring. Additionally, we introduce the Simple Feature-Embedding Module (SFEM) to replace the pointwise and depthwise convolutions to generate simplified embedding features in the self-attention mechanism. This innovation substantially reduces computational complexity and memory usage while maintaining overall performance. Finally, through comprehensive experiments, our compact model yields promising quantitative and qualitative results for image deblurring on various benchmark datasets.
物体或相机平台的运动会使采集到的图像模糊。这种退化是从成像传感器获得低质量图像的主要原因。因此,开发一种基于深度学习的高效图像处理方法来去除模糊伪像是很有必要的。深度学习最近在图像去模糊方面显示出显著的效果,主要是通过卷积神经网络(CNN)和Transformer。然而,CNN有限的感受野限制了它们捕捉长距离结构依赖性的能力。相比之下,Transformer擅长对这些依赖性进行建模,但对于高分辨率输入来说计算成本很高,并且缺乏适当的归纳偏差。为了克服这些挑战,我们提出了一种高效混合网络(EHNet),它采用CNN编码器进行局部特征提取,并使用带有双注意力模块的Transformer解码器来捕捉空间和通道方向的依赖性。这种协同作用有助于获取丰富的上下文信息以进行高质量的图像去模糊。此外,我们引入了简单特征嵌入模块(SFEM)来取代逐点卷积和深度卷积,以便在自注意力机制中生成简化的嵌入特征。这一创新在保持整体性能的同时,大幅降低了计算复杂度和内存使用。最后,通过全面的实验,我们的紧凑模型在各种基准数据集上的图像去模糊任务中产生了有希望的定量和定性结果。