Li Tianyi, Xu Mai, Tang Runzhi, Chen Ying, Xing Qunliang
IEEE Trans Image Process. 2021;30:5377-5390. doi: 10.1109/TIP.2021.3083447. Epub 2021 Jun 3.
Versatile Video Coding (VVC), as the latest standard, significantly improves the coding efficiency over its predecessor standard High Efficiency Video Coding (HEVC), but at the expense of sharply increased complexity. In VVC, the quad-tree plus multi-type tree (QTMT) structure of the coding unit (CU) partition accounts for over 97% of the encoding time, due to the brute-force search for recursive rate-distortion (RD) optimization. Instead of the brute-force QTMT search, this paper proposes a deep learning approach to predict the QTMT-based CU partition, for drastically accelerating the encoding process of intra-mode VVC. First, we establish a large-scale database containing sufficient CU partition patterns with diverse video content, which can facilitate the data-driven VVC complexity reduction. Next, we propose a multi-stage exit CNN (MSE-CNN) model with an early-exit mechanism to determine the CU partition, in accord with the flexible QTMT structure at multiple stages. Then, we design an adaptive loss function for training the MSE-CNN model, synthesizing both the uncertain number of split modes and the target on minimized RD cost. Finally, a multi-threshold decision scheme is developed, achieving a desirable trade-off between complexity and RD performance. The experimental results demonstrate that our approach can reduce the encoding time of VVC by 44.65%66.88% with a negligible Bjøntegaard delta bit-rate (BD-BR) of 1.322%3.188%, significantly outperforming other state-of-the-art approaches.
通用视频编码(VVC)作为最新标准,相较于其前身标准高效视频编码(HEVC)显著提高了编码效率,但代价是复杂度急剧增加。在VVC中,编码单元(CU)划分的四叉树加多种类型树(QTMT)结构占编码时间的97%以上,这是由于对递归率失真(RD)优化进行强力搜索所致。本文提出一种深度学习方法来预测基于QTMT的CU划分,而非强力的QTMT搜索,以大幅加速帧内模式VVC的编码过程。首先,我们建立一个包含具有多样视频内容的足够CU划分模式的大规模数据库,这有助于基于数据驱动降低VVC的复杂度。接下来,我们提出一种具有早期退出机制的多阶段退出卷积神经网络(MSE-CNN)模型,以根据多阶段灵活的QTMT结构确定CU划分。然后,我们设计一种自适应损失函数来训练MSE-CNN模型,综合考虑分裂模式数量的不确定性和最小化RD成本的目标。最后,开发一种多阈值决策方案,在复杂度和RD性能之间实现理想的权衡。实验结果表明,我们的方法可将VVC的编码时间减少44.65%至66.88%,同时具有可忽略不计的Bjøntegaard比特率增量(BD-BR),为1.322%至3.188%,显著优于其他现有最先进方法。