IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8594-8605. doi: 10.1109/TPAMI.2022.3227116. Epub 2023 Jun 5.
This article explores how to harvest precise object segmentation masks while minimizing the human interaction cost. To achieve this, we propose a simple yet effective interaction scheme, named Inside-Outside Guidance (IOG). Concretely, we leverage an inside point that is clicked near the object center and two outside points at the symmetrical corner locations (top-left and bottom-right or top-right and bottom-left) of an almost-tight bounding box that encloses the target object. The interaction results in a total of one foreground click and four background clicks for segmentation. The advantages of our IOG are four-fold: 1) the two outside points can help remove distractions from other objects or background; 2) the inside point can help eliminate the unrelated regions inside the bounding box; 3) the inside and outside points are easily identified, reducing the confusion raised by the state-of-the-art DEXTR Maninis et al. 2018, in labeling some extreme samples; 4) it naturally supports additional click annotations for further correction. Despite its simplicity, our IOG not only achieves state-of-the-art performance on several popular benchmarks such as GrabCut Rother et al. 2004, PASCAL Everingham et al. 2010 and MS COCO Russakovsky et al. 2015, but also demonstrates strong generalization capability across different domains such as street scenes (Cityscapes Cordts et al. 2016), aerial imagery (Rooftop Sun et al. 2014 and Agriculture-Vision Chiu et al. 2020) and medical images (ssTEM Gerhard et al. 2013). Code is available at https://github.com/shiyinzhang/Inside-Outside-Guidancehttps://github.com/shiyinzhang/Inside-Outside-Guidance.
本文探讨了如何在最小化人工交互成本的同时,获取精确的目标分割掩模。为此,我们提出了一种简单而有效的交互方案,称为内外引导(IOG)。具体来说,我们利用一个位于物体中心附近的内点,以及两个位于几乎紧包围目标物体的边界框的对称角位置(左上和右下或右上和左下)的外点。交互总共需要进行一次前景点击和四次背景点击来进行分割。我们的 IOG 具有以下四个优点:1)两个外点有助于去除其他物体或背景的干扰;2)内点有助于消除边界框内的无关区域;3)内点和外点易于识别,减少了 Maninis 等人提出的 DEXTR 2018 方法在标注一些极端样本时引起的混淆;4)它自然支持额外的点击注释,以进行进一步的修正。尽管简单,我们的 IOG 不仅在 GrabCut Rother 等人 2004 年、PASCAL Everingham 等人 2010 年和 MS COCO Russakovsky 等人 2015 年的几个流行基准上实现了最先进的性能,而且还展示了在不同领域(如街景(Cityscapes Cordts 等人 2016 年)、航空图像(Rooftop Sun 等人 2014 年和 Agriculture-Vision Chiu 等人 2020 年)和医学图像(ssTEM Gerhard 等人 2013 年)的强大泛化能力。代码可在 https://github.com/shiyinzhang/Inside-Outside-Guidance 上获得。