IEEE Trans Pattern Anal Mach Intell. 2019 Nov;41(11):2693-2708. doi: 10.1109/TPAMI.2018.2858783. Epub 2018 Jul 24.
Panoramic video provides immersive and interactive experience by enabling humans to control the field of view (FoV) through head movement (HM). Thus, HM plays a key role in modeling human attention on panoramic video. This paper establishes a database collecting subjects' HM in panoramic video sequences. From this database, we find that the HM data are highly consistent across subjects. Furthermore, we find that deep reinforcement learning (DRL) can be applied to predict HM positions, via maximizing the reward of imitating human HM scanpaths through the agent's actions. Based on our findings, we propose a DRL-based HM prediction (DHP) approach with offline and online versions, called offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to determine potential HM positions at each panoramic frame. Then, a heat map of the potential HM positions, named the HM map, is generated as the output of offline-DHP. In online-DHP, the next HM position of one subject is estimated given the currently observed HM position, which is achieved by developing a DRL algorithm upon the learned offline-DHP model. Finally, the experiments validate that our approach is effective in both offline and online prediction of HM positions for panoramic video, and that the learned offline-DHP model can improve the performance of online-DHP.
全景视频通过允许人类通过头部运动 (HM) 控制视场 (FoV) 来提供沉浸式和交互式体验。因此,HM 在全景视频中的人类注意力建模中起着关键作用。本文建立了一个数据库,用于收集对象在全景视频序列中的 HM。从该数据库中,我们发现 HM 数据在不同对象之间高度一致。此外,我们发现深度强化学习 (DRL) 可用于通过代理的动作最大化模仿人类 HM 扫描路径的奖励来预测 HM 位置。基于我们的发现,我们提出了一种基于 DRL 的 HM 预测 (DHP) 方法,具有离线和在线版本,分别称为离线-DHP 和在线-DHP。在离线-DHP 中,运行多个 DRL 工作流以确定每个全景帧的潜在 HM 位置。然后,生成潜在 HM 位置的热图,称为 HM 图,作为离线-DHP 的输出。在线-DHP 中,根据当前观察到的 HM 位置估计一个对象的下一个 HM 位置,这是通过在学习到的离线-DHP 模型上开发 DRL 算法来实现的。最后,实验验证了我们的方法在全景视频的 HM 位置的离线和在线预测中都很有效,并且学习到的离线-DHP 模型可以提高在线-DHP 的性能。