Xu Mai, Li Tianyi, Wang Zulin, Deng Xin, Yang Ren, Guan Zhenyu
IEEE Trans Image Process. 2018 Jun 13. doi: 10.1109/TIP.2018.2847035.
High Efficiency Video Coding (HEVC) significantly reduces bit-rates over the preceding H.264 standard but at the expense of extremely high encoding complexity. In HEVC, the quad-tree partition of coding unit (CU) consumes a large proportion of the HEVC encoding complexity, due to the brute-force search for rate-distortion optimization (RDO). Therefore, this paper proposes a deep learning approach to predict the CU partition for reducing the HEVC complexity at both intra-and inter-modes, which is based on convolutional neural network (CNN) and long-and short-term memory (LSTM) network. First, we establish a large-scale database including substantial CU partition data for HEVC intra-and inter-modes. This enables deep learning on the CU partition. Second, we represent the CU partition of an entire coding tree unit (CTU) in the form of a hierarchical CU partition map (HCPM). Then, we propose an early-terminated hierarchical CNN (ETH-CNN) for learning to predict the HCPM. Consequently, the encoding complexity of intra-mode HEVC can be drastically reduced by replacing the brute-force search with ETH-CNN to decide the CU partition. Third, an early-terminated hierarchical LSTM (ETH-LSTM) is proposed to learn the temporal correlation of the CU partition. Then, we combine ETH-LSTM and ETH-CNN to predict the CU partition for reducing the HEVC complexity at inter-mode. Finally, experimental results show that our approach outperforms other state-of-the-art approaches in reducing the HEVC complexity at both intra-and inter-modes.
高效视频编码(HEVC)相较于之前的H.264标准显著降低了比特率,但代价是编码复杂度极高。在HEVC中,由于对率失真优化(RDO)进行强力搜索,编码单元(CU)的四叉树划分消耗了HEVC编码复杂度的很大一部分。因此,本文提出一种深度学习方法,用于预测CU划分,以降低HEVC在帧内和帧间模式下的复杂度,该方法基于卷积神经网络(CNN)和长短时记忆(LSTM)网络。首先,我们建立一个大规模数据库,其中包含用于HEVC帧内和帧间模式的大量CU划分数据。这使得能够对CU划分进行深度学习。其次,我们以分层CU划分图(HCPM)的形式表示整个编码树单元(CTU)的CU划分。然后,我们提出一种提前终止的分层CNN(ETH-CNN)用于学习预测HCPM。因此,通过用ETH-CNN替代强力搜索来决定CU划分,帧内模式HEVC的编码复杂度可大幅降低。第三,提出一种提前终止的分层LSTM(ETH-LSTM)来学习CU划分的时间相关性。然后,我们将ETH-LSTM和ETH-CNN相结合来预测CU划分,以降低帧间模式下的HEVC复杂度。最后,实验结果表明,我们的方法在降低HEVC帧内和帧间模式的复杂度方面优于其他现有最先进方法。