IEEE Trans Image Process. 2021;30:697-711. doi: 10.1109/TIP.2020.3038348. Epub 2020 Dec 4.
This paper presents an iterative training of neural networks for intra prediction in a block-based image and video codec. First, the neural networks are trained on blocks arising from the codec partitioning of images, each paired with its context. Then, iteratively, blocks are collected from the partitioning of images via the codec including the neural networks trained at the previous iteration, each paired with its context, and the neural networks are retrained on the new pairs. Thanks to this training, the neural networks can learn intra prediction functions that both stand out from those already in the initial codec and boost the codec in terms of rate-distortion. Moreover, the iterative process allows the design of training data cleansings essential for the neural network training. When the iteratively trained neural networks are put into H.265 (HM-16.15), -4.2% of mean BD-rate reduction is obtained, i.e. -1.8% above the state-of-the-art. By moving them into H.266 (VTM-5.0), the mean BD-rate reduction reaches -1.9%.
本文提出了一种基于神经网络的迭代训练方法,用于块基图像和视频编解码器中的帧内预测。首先,在图像编解码器的分区中对神经网络进行训练,每个分区都与上下文配对。然后,通过包含在上一次迭代中训练的神经网络的编解码器,迭代地从图像分区中收集块,每个块都与上下文配对,并对新的配对进行神经网络重新训练。通过这种训练,神经网络可以学习到从初始编解码器中脱颖而出的帧内预测函数,并在率失真方面提高编解码器的性能。此外,迭代过程允许设计对神经网络训练至关重要的训练数据清洗。当将迭代训练后的神经网络应用于 H.265(HM-16.15)时,平均 BD 率降低了-4.2%,即比最先进的技术高出 1.8%。将它们应用于 H.266(VTM-5.0)时,平均 BD 率降低达到-1.9%。