Wei Tianyi, Chen Dongdong, Zhou Wenbo, Liao Jing, Zhao Hanqing, Zhang Weiming, Hua Gang, Yu Nenghai
IEEE Trans Pattern Anal Mach Intell. 2024 Feb;46(2):881-895. doi: 10.1109/TPAMI.2023.3326693. Epub 2024 Jan 8.
Image matting is a fundamental and challenging problem in computer vision and graphics. Most existing matting methods leverage a user-supplied trimap as an auxiliary input to produce good alpha matte. However, obtaining high-quality trimap itself is arduous. Recently, some hint-free methods have emerged, however, the matting quality is still far behind the trimap-based methods. The main reason is that, some hints for removing semantic ambiguity and improving matting quality are essential. Apparently, there is a trade-off between interaction cost and matting quality. To balance performance and user-friendliness, we propose an improved deep image matting framework which is trimap-free and only needs sparse user click or scribble interaction to minimize the needed auxiliary constraints while still allowing interactivity. Moreover, we introduce uncertainty estimation that predicts which parts need polishing and conduct uncertainty-guided refinement. To trade off runtime against refinement quality, users can also choose different refinement modes. Experimental results show that our method performs better than existing trimap-free methods and comparably to state-of-the-art trimap-based methods with minimal user effort. Finally, we demonstrate the extensibility of our framework to video human matting without any structure modification, by adding optical flow-based sparse hint propagation and temporal consistency regularization imposed on the single frame.
图像抠图是计算机视觉和图形学中的一个基本且具有挑战性的问题。大多数现有的抠图方法利用用户提供的三值图作为辅助输入来生成良好的alpha遮罩。然而,获得高质量的三值图本身很艰巨。最近,一些无需提示的方法已经出现,但是,抠图质量仍然远远落后于基于三值图的方法。主要原因是,一些用于消除语义模糊和提高抠图质量的提示是必不可少的。显然,在交互成本和抠图质量之间存在权衡。为了平衡性能和用户友好性,我们提出了一种改进的深度图像抠图框架,该框架无需三值图,只需要稀疏的用户点击或涂鸦交互,以最小化所需的辅助约束,同时仍允许交互性。此外,我们引入了不确定性估计,以预测哪些部分需要优化,并进行不确定性引导的细化。为了在运行时和细化质量之间进行权衡,用户还可以选择不同的细化模式。实验结果表明,我们的方法比现有的无需三值图的方法表现更好,并且在用户工作量最小的情况下与基于三值图的最先进方法相当。最后,我们通过添加基于光流的稀疏提示传播和施加在单帧上的时间一致性正则化,展示了我们的框架在不进行任何结构修改的情况下扩展到视频人体抠图的能力。