Suppr超能文献

用于VVC中数据驱动环路滤波器的QP自适应双路径残差集成频率变压器

QP-Adaptive Dual-Path Residual Integrated Frequency Transformer for Data-Driven In-Loop Filter in VVC.

作者信息

Yeh Cheng-Hsuan, Ni Chi-Ting, Huang Kuan-Yu, Wu Zheng-Wei, Peng Cheng-Pin, Chen Pei-Yin

机构信息

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 70101, Taiwan.

出版信息

Sensors (Basel). 2025 Jul 7;25(13):4234. doi: 10.3390/s25134234.

Abstract

As AI-enabled embedded systems such as smart TVs and edge devices demand efficient video processing, Versatile Video Coding (VVC/H.266) becomes essential for bandwidth-constrained Multimedia Internet of Things (M-IoT) applications. However, its block-based coding often introduces compression artifacts. While CNN-based methods effectively reduce these artifacts, maintaining robust performance across varying quantization parameters (QPs) remains challenging. Recent QP-adaptive designs like QA-Filter show promise but are still limited. This paper proposes DRIFT, a QP-adaptive in-loop filtering network for VVC. DRIFT combines a lightweight frequency fusion CNN (LFFCNN) for local enhancement and a Swin Transformer-based global skip connection for capturing long-range dependencies. LFFCNN leverages octave convolution and introduces a novel residual block (FFRB) that integrates multiscale extraction, QP adaptivity, frequency fusion, and spatial-channel attention. A QP estimator (QPE) is further introduced to mitigate double enhancement in inter-coded frames. Experimental results demonstrate that DRIFT achieves BD rate reductions of 6.56% (intra) and 4.83% (inter), with an up to 10.90% gain on the BasketballDrill sequence. Additionally, LFFCNN reduces the model size by 32% while slightly improving the coding performance over QA-Filter.

摘要

随着智能电视和边缘设备等支持人工智能的嵌入式系统对高效视频处理的需求不断增加,通用视频编码(VVC/H.266)对于带宽受限的多媒体物联网(M-IoT)应用变得至关重要。然而,其基于块的编码常常会引入压缩伪像。虽然基于卷积神经网络(CNN)的方法能有效减少这些伪像,但在不同量化参数(QP)下保持稳健性能仍具有挑战性。像QA-Filter这样的近期QP自适应设计显示出了潜力,但仍存在局限性。本文提出了DRIFT,一种用于VVC的QP自适应环路滤波网络。DRIFT结合了用于局部增强的轻量级频率融合CNN(LFFCNN)和用于捕获长距离依赖性的基于Swin Transformer的全局跳过连接。LFFCNN利用八度卷积并引入了一种新颖的残差块(FFRB),该残差块集成了多尺度提取、QP适应性、频率融合和空间通道注意力。还引入了一个QP估计器(QPE)来减轻帧间编码帧中的双重增强。实验结果表明,DRIFT在帧内编码时实现了6.56%的BD速率降低,在帧间编码时实现了4.83%的BD速率降低,在BasketballDrill序列上增益高达10.90%。此外,LFFCNN将模型大小减少了32%,同时在编码性能上比QA-Filter略有提升。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/493d/12252514/af88da60617c/sensors-25-04234-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验