IEEE Trans Image Process. 2023;32:2237-2251. doi: 10.1109/TIP.2023.3266165. Epub 2023 Apr 21.
The Versatile Video Coding (VVC) standard introduces a block partitioning structure known as quadtree plus nested multi-type tree (QTMTT), which allows more flexible block partitioning compared to its predecessors, like High Efficiency Video Coding (HEVC). Meanwhile, the partition search (PS) process, which is to find out the best partitioning structure for optimizing the rate-distortion cost, becomes far more complicated for VVC than for HEVC. Also, the PS process in VVC reference software (VTM) is not friendly to hardware implementation. We propose a partition map prediction method for fast block partitioning in VVC intra-frame encoding. The proposed method may replace PS totally or be combined with PS partially, thereby achieving adjustable acceleration of the VTM intra-frame encoding. Different from the previous methods for fast block partitioning, we propose to represent a QTMTT-based block partitioning structure by a partition map, which consists of a quadtree (QT) depth map, several multi-type tree (MTT) depth maps, and several MTT direction maps. We then propose to predict the optimal partition map from the pixels through a convolutional neural network (CNN). We propose a CNN structure, known as Down-Up-CNN, for the partition map prediction, where the CNN structure emulates the recursive nature of the PS process. Moreover, we design a post-processing algorithm to adjust the network output partition map, so as to obtain a standard-compliant block partitioning structure. The post-processing algorithm may produce a partial partition tree as well; then based on the partial partition tree, the PS process is performed to obtain the full tree. Experimental results show that the proposed method achieves 1.61× to 8.64× encoding acceleration for the VTM-10.0 intra-frame encoder, with the ratio depending on how much PS is performed. Especially, when achieving 3.89× encoding acceleration, the compression efficiency loss is 2.77% in BD-rate, which is a better tradeoff than the previous methods.
灵活视频编码 (VVC) 标准引入了一种称为四叉树加嵌套多类型树 (QTMTT) 的块分区结构,与前几代标准(如高效视频编码 (HEVC))相比,它允许更灵活的块分区。同时,分区搜索 (PS) 过程变得更加复杂,需要找到最佳的分区结构来优化率失真代价。此外,VVC 参考软件 (VTM) 中的 PS 过程不利于硬件实现。我们提出了一种用于 VVC 帧内编码快速块分区的分区图预测方法。该方法可以完全替代 PS,也可以与 PS 部分结合,从而实现 VTM 帧内编码的可调节加速。与之前的快速块分区方法不同,我们提出通过卷积神经网络 (CNN) 从像素预测基于 QTMTT 的块分区结构。我们提出了一种称为 Down-Up-CNN 的 CNN 结构来预测最优分区图,其中 CNN 结构模拟了 PS 过程的递归性质。此外,我们设计了一种后处理算法来调整网络输出分区图,以获得符合标准的块分区结构。后处理算法也可以生成部分分区树;然后基于部分分区树,执行 PS 过程以获得完整的树。实验结果表明,该方法可使 VTM-10.0 帧内编码器的编码加速比达到 1.61 倍至 8.64 倍,具体加速比取决于 PS 的执行程度。特别是,当实现 3.89 倍的编码加速时,BD-rate 下的压缩效率损失仅为 2.77%,优于之前的方法。