School of Computing, Gachon University, Seongnam 13120, Republic of Korea.
Sensors (Basel). 2023 Mar 7;23(6):2880. doi: 10.3390/s23062880.
Video deblurring aims at removing the motion blur caused by the movement of objects or camera shake. Traditional video deblurring methods have mainly focused on frame-based deblurring, which takes only blurry frames as the input to produce sharp frames. However, frame-based deblurring has shown poor picture quality in challenging cases of video restoration where severely blurred frames are provided as the input. To overcome this issue, recent studies have begun to explore the event-based approach, which uses the event sequence captured by an event camera for motion deblurring. Event cameras have several advantages compared to conventional frame cameras. Among these advantages, event cameras have a low latency in imaging data acquisition (0.001 ms for event cameras vs. 10 ms for frame cameras). Hence, event data can be acquired at a high acquisition rate (up to one microsecond). This means that the event sequence contains more accurate motion information than video frames. Additionally, event data can be acquired with less motion blur. Due to these advantages, the use of event data is highly beneficial for achieving improvements in the quality of deblurred frames. Accordingly, the results of event-based video deblurring are superior to those of frame-based deblurring methods, even for severely blurred video frames. However, the direct use of event data can often generate visual artifacts in the final output frame (e.g., image noise and incorrect textures), because event data intrinsically contain insufficient textures and event noise. To tackle this issue in event-based deblurring, we propose a two-stage coarse-refinement network by adding a frame-based refinement stage that utilizes all the available frames with more abundant textures to further improve the picture quality of the first-stage coarse output. Specifically, a coarse intermediate frame is estimated by performing event-based video deblurring in the first-stage network. A residual hint attention (RHA) module is also proposed to extract useful attention information from the coarse output and all the available frames. This module connects the first and second stages and effectively guides the frame-based refinement of the coarse output. The final deblurred frame is then obtained by refining the coarse output using the residual hint attention and all the available frame information in the second-stage network. We validated the deblurring performance of the proposed network on the GoPro synthetic dataset (33 videos and 4702 frames) and the HQF real dataset (11 videos and 2212 frames). Compared to the state-of-the-art method (D2Net), we achieved a performance improvement of 1 dB in PSNR and 0.05 in SSIM on the GoPro dataset, and an improvement of 1.7 dB in PSNR and 0.03 in SSIM on the HQF dataset.
视频去模糊旨在去除由于物体运动或相机抖动而造成的运动模糊。传统的视频去模糊方法主要集中在基于帧的去模糊上,它只将模糊的帧作为输入来生成清晰的帧。然而,基于帧的去模糊在输入严重模糊的帧的视频恢复的挑战性情况下显示出较差的图像质量。为了克服这个问题,最近的研究开始探索基于事件的方法,该方法使用事件相机捕获的事件序列进行运动去模糊。与传统的帧相机相比,事件相机具有几个优势。其中,事件相机在成像数据采集方面具有较低的延迟(事件相机为 0.001 毫秒,而帧相机为 10 毫秒)。因此,可以以较高的采集率(高达微秒)采集事件数据。这意味着事件序列包含比视频帧更准确的运动信息。此外,事件数据可以以较少的运动模糊采集。由于这些优势,使用事件数据对于提高去模糊帧的质量非常有益。因此,基于事件的视频去模糊的结果优于基于帧的去模糊方法,即使对于严重模糊的视频帧也是如此。然而,直接使用事件数据通常会在最终输出帧中产生视觉伪影(例如,图像噪声和不正确的纹理),因为事件数据本质上包含不足的纹理和事件噪声。为了解决这个问题,我们提出了一种两阶段的粗细化网络,通过添加一个基于帧的细化阶段来利用更多丰富纹理的所有可用帧,进一步提高第一阶段粗输出的图像质量。具体来说,通过在第一阶段网络中进行基于事件的视频去模糊来估计粗中间帧。还提出了一种残余提示注意(RHA)模块,从粗输出和所有可用帧中提取有用的注意信息。该模块连接第一和第二阶段,并有效地指导粗输出的基于帧的细化。然后,通过在第二阶段网络中使用残余提示注意和所有可用帧信息来细化粗输出,获得最终的去模糊帧。我们在 GoPro 合成数据集(33 个视频和 4702 帧)和 HQF 真实数据集(11 个视频和 2212 帧)上验证了所提出网络的去模糊性能。与最先进的方法(D2Net)相比,我们在 GoPro 数据集上的 PSNR 提高了 1dB,SSIM 提高了 0.05,在 HQF 数据集上的 PSNR 提高了 1.7dB,SSIM 提高了 0.03。