IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5103-5115. doi: 10.1109/TNNLS.2019.2963282. Epub 2020 Nov 30.
Integrating human-provided location priors into video object segmentation has been shown to be an effective strategy to enhance performance, but their application at large scale is unfeasible. Gamification can help reduce the annotation burden, but it still requires user involvement. We propose a video object segmentation framework that leverages the combined advantages of user feedback for segmentation and gamification strategy by simulating multiple game players through a reinforcement learning (RL) model that reproduces human ability to pinpoint moving objects and using the simulated feedback to drive the decisions of a fully convolutional deep segmentation network. Experimental results on the DAVIS-17 benchmark show that: 1) including user-provided prior, even if not precise, yields high performance; 2) our RL agent replicates satisfactorily the same variability of humans in identifying spatiotemporal salient objects; and 3) employing artificially generated priors in an unsupervised video object segmentation model reaches state-of-the-art performance.
将人工提供的位置先验信息整合到视频对象分割中已被证明是一种有效提高性能的策略,但在大规模应用中是不可行的。游戏化可以帮助减轻注释负担,但仍然需要用户参与。我们提出了一种视频对象分割框架,通过强化学习(RL)模型模拟多个游戏玩家,利用用户对分割的反馈和游戏化策略的综合优势,该模型复制了人类精确定位移动物体的能力,并使用模拟反馈来驱动全卷积深度分割网络的决策。在 DAVIS-17 基准上的实验结果表明:1)即使提供的先验信息不精确,也能获得高性能;2)我们的 RL 代理能够令人满意地复制人类在识别时空显著对象时的相同可变性;3)在无监督视频对象分割模型中使用人工生成的先验信息可达到最新技术水平。