Paul Somdyuti, Norkin Andrey, Bovik Alan C
IEEE Trans Image Process. 2020 Jul 28;PP. doi: 10.1109/TIP.2020.3011270.
In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64×64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjøntegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.
在VP9视频编解码器中,块的大小在编码期间通过使用率失真优化(RDO)对64×64的超块进行递归划分来确定。由于超块可能划分的组合搜索空间,这个过程计算量很大。在此,我们提出一种基于深度学习的替代框架,使用分层全卷积网络(H-FCN)以四级划分树的形式预测帧内模式超块划分。我们创建了一个包含VP9超块及其相应划分的大型数据库来训练H-FCN模型,随后将其与VP9编码器集成以减少帧内模式编码时间。实验结果表明,我们的方法平均将帧内模式编码速度提高了69.7%,代价是Bjøntegaard-Delta比特率(BD-rate)增加了1.71%。虽然VP9提供了几个内置速度级别,旨在以降低率失真性能为代价提供更快的编码,但我们发现,在加速和BD-rate方面,对于高质量帧内编码配置,我们的模型能够优于参考VP9编码器推荐的最快速度级别。