Wang Weiran, Jing Minge, Fan Yibo, Weng Wei
School of Microelectronics, Fudan University, Shanghai 200433, China.
Department of Liberal Arts and Science, Kanazawa University, Ishikawa 920-1192, Japan.
Sensors (Basel). 2024 Mar 16;24(6):1907. doi: 10.3390/s24061907.
In recent years, the rapid prevalence of high-definition video in Internet of Things (IoT) systems has been directly facilitated by advances in imaging sensor technology. To adapt to limited uplink bandwidth, most media platforms opt to compress videos to bitrate streams for transmission. However, this compression often leads to significant texture loss and artifacts, which severely degrade the Quality of Experience (QoE). We propose a latent feature diffusion model (LFDM) for compressed video quality enhancement, which comprises a compact edge latent feature prior network (ELPN) and a conditional noise prediction network (CNPN). Specifically, we first pre-train ELPNet to construct a latent feature space that captures rich detail information for representing sharpness latent variables. Second, we incorporate these latent variables into the prediction network to iteratively guide the generation direction, thus resolving the problem that the direct application of diffusion models to temporal prediction disrupts inter-frame dependencies, thereby completing the modeling of temporal correlations. Lastly, we innovatively develop a Grouped Domain Fusion module that effectively addresses the challenges of diffusion distortion caused by naive cross-domain information fusion. Comparative experiments on the MFQEv2 benchmark validate our algorithm's superior performance in terms of both objective and subjective metrics. By integrating with codecs and image sensors, our method can provide higher video quality.
近年来,成像传感器技术的进步直接推动了高清视频在物联网(IoT)系统中的迅速普及。为了适应有限的上行链路带宽,大多数媒体平台选择将视频压缩为比特率流进行传输。然而,这种压缩通常会导致显著的纹理损失和伪像,严重降低体验质量(QoE)。我们提出了一种用于压缩视频质量增强的潜在特征扩散模型(LFDM),它由一个紧凑的边缘潜在特征先验网络(ELPN)和一个条件噪声预测网络(CNPN)组成。具体来说,我们首先对ELPNet进行预训练,以构建一个潜在特征空间,该空间捕获丰富的细节信息来表示清晰度潜在变量。其次,我们将这些潜在变量纳入预测网络,以迭代地引导生成方向,从而解决直接将扩散模型应用于时间预测会破坏帧间依赖性的问题,进而完成时间相关性的建模。最后,我们创新性地开发了一个分组域融合模块,有效解决了简单跨域信息融合导致的扩散失真挑战。在MFQEv2基准上的对比实验验证了我们算法在客观和主观指标方面的卓越性能。通过与编解码器和图像传感器集成,我们的方法可以提供更高的视频质量。