Hu Ping, Wang Gang, Kong Xiangfei, Kuen Jason, Tan Yap-Peng
IEEE Trans Pattern Anal Mach Intell. 2020 Aug;42(8):1957-1967. doi: 10.1109/TPAMI.2019.2906175. Epub 2019 Mar 19.
In this work, we propose a motion-guided cascaded refinement network for video object segmentation. By assuming the foreground objects show different motion patterns from the background, for each video frame we apply an active contour model on optical flow to coarsely segment the foreground. The proposed Cascaded Refinement Network (CRN) then takes as guidance the coarse segmentation to generate an accurate segmentation in full resolution. In this way, the motion information and the deep CNNs can complement each other well to accurately segment the foreground objects from video frames. To deal with multi-instance cases, we extend our method with a spatial-temporal instance embedding model that further segments the foreground regions into instances and propagates instance labels. We further introduce a single-channel residual attention module in CRN to incorporate the coarse segmentation map as attention, which makes the network effective and efficient in both training and testing. We perform experiments on popular benchmarks and the results show that our method achieves state-of-the-art performance with high time efficiency.
在这项工作中,我们提出了一种用于视频对象分割的运动引导级联细化网络。通过假设前景对象与背景呈现不同的运动模式,对于每个视频帧,我们在光流上应用主动轮廓模型来粗略分割前景。然后,所提出的级联细化网络(CRN)将粗分割作为指导,以生成全分辨率的精确分割。通过这种方式,运动信息和深度卷积神经网络可以很好地相互补充,从而从视频帧中准确分割出前景对象。为了处理多实例情况,我们使用时空实例嵌入模型扩展了我们的方法,该模型进一步将前景区域分割为实例并传播实例标签。我们还在CRN中引入了单通道残差注意力模块,将粗分割图作为注意力合并,这使得网络在训练和测试中都有效且高效。我们在流行的基准测试上进行了实验,结果表明我们的方法在高时间效率下达到了当前的最优性能。