IEEE Trans Vis Comput Graph. 2021 May;27(5):2736-2745. doi: 10.1109/TVCG.2021.3067686. Epub 2021 Apr 15.
Virtual reality (VR) video streaming (a.k.a., 360-degree video streaming) has been gaining popularity recently as a new form of multimedia providing the users with immersive viewing experience. However, the high volume of data for the 360-degree video frames creates significant bandwidth challenges. Research efforts have been made to reduce the bandwidth consumption by predicting and selectively streaming the user's viewports. However, the existing approaches require historical user or video data and cannot be applied to live streaming, the most attractive VR streaming scenario. We develop a live viewport prediction mechanism, namely LiveObj, by detecting the objects in the video based on their semantics. The detected objects are then tracked to infer the user's viewport in real time by employing a reinforcement learning algorithm. Our evaluations based on 48 users watching 10 VR videos demonstrate high prediction accuracy and significant bandwidth savings obtained by LiveObj. Also, LiveObj achieves real-time performance with low processing delays, meeting the requirement of live VR streaming.
虚拟现实(VR)视频流(又称 360 度视频流)作为一种提供沉浸式观看体验的新型多媒体形式,最近越来越受欢迎。然而,360 度视频帧的数据量很大,这给带宽带来了巨大的挑战。研究人员已经通过预测和有选择地传输用户的视口来减少带宽消耗。然而,现有的方法需要历史用户或视频数据,并且不能应用于直播,这是最具吸引力的 VR 流媒体场景。我们通过基于语义检测视频中的对象来开发实时视口预测机制,即 LiveObj。然后,通过使用强化学习算法来跟踪检测到的对象,实时推断用户的视口。我们基于 48 名用户观看 10 个 VR 视频的评估表明,LiveObj 具有很高的预测准确性和显著的带宽节省。此外,LiveObj 实现了低处理延迟的实时性能,满足了实时 VR 流媒体的要求。