Paul Somdyuti, Norkin Andrey, Bovik Alan C
IEEE Trans Image Process. 2024;33:5114-5128. doi: 10.1109/TIP.2024.3455989. Epub 2024 Sep 19.
Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjøntegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.
自适应视频流依赖于构建高效的比特率阶梯,以便在带宽受限的情况下为观众提供尽可能好的视觉质量。传统的基于内容的比特率阶梯选择方法需要对视频镜头进行多编码参数预编码,以找到由所得速率-质量曲线的凸包给出的最佳工作点。然而,这个预编码步骤相当于在可能的编码参数空间上进行穷举搜索过程,这在计算和时间消耗方面都会导致显著的开销。为了减少这种开销,我们提出了一种基于深度学习的内容感知凸包预测方法。我们采用递归卷积网络(RCN)来隐式分析视频镜头的时空复杂性,以便预测它们的凸包。采用两步迁移学习方案来训练我们提出的RCN-Hull模型,这确保了足够的内容多样性以分析场景复杂性,同时也能够捕捉原始源视频的场景统计信息。我们的实验结果表明,我们提出的模型能够更好地逼近最优凸包,并且与现有方法相比,在时间节省方面具有竞争力。平均而言,我们的方法将预编码时间减少了53.8%,而预测凸包相对于真实情况的平均Bjøntegaard 增量比特率(BD-rate)为0.26%,BD-rate分布的平均绝对偏差为0.57%。