Wu Zhenyu, Li Shuai, Chen Chenglizhao, Qin Hong, Hao Aimin
IEEE Trans Image Process. 2022;31:6649-6663. doi: 10.1109/TIP.2022.3214332. Epub 2022 Oct 26.
Recent research advances in salient object detection (SOD) could largely be attributed to ever-stronger multi-scale feature representation empowered by the deep learning technologies. The existing SOD deep models extract multi-scale features via the off-the-shelf encoders and combine them smartly via various delicate decoders. However, the kernel sizes in this commonly-used thread are usually "fixed". In our new experiments, we have observed that kernels of small size are preferable in scenarios containing tiny salient objects. In contrast, large kernel sizes could perform better for images with large salient objects. Inspired by this observation, we advocate the "dynamic" scale routing (as a brand-new idea) in this paper. It will result in a generic plug-in that could directly fit the existing feature backbone. This paper's key technical innovations are two-fold. First, instead of using the vanilla convolution with fixed kernel sizes for the encoder design, we propose the dynamic pyramid convolution (DPConv), which dynamically selects the best-suited kernel sizes w.r.t. the given input. Second, we provide a self-adaptive bidirectional decoder design to accommodate the DPConv-based encoder best. The most significant highlight is its capability of routing between feature scales and their dynamic collection, making the inference process scale-aware. As a result, this paper continues to enhance the current SOTA performance. Both the code and dataset are publicly available at https://github.com/wuzhenyubuaa/DPNet.
显著目标检测(SOD)领域的最新研究进展很大程度上归功于深度学习技术带来的日益强大的多尺度特征表示。现有的SOD深度模型通过现成的编码器提取多尺度特征,并通过各种精巧的解码器巧妙地将它们组合起来。然而,这种常用思路中的内核大小通常是“固定的”。在我们的新实验中,我们观察到在包含微小显著目标的场景中,小尺寸内核更可取。相比之下,大内核尺寸对于具有大显著目标的图像可能表现更好。受此观察结果启发,我们在本文中倡导“动态”尺度路由(作为一个全新的思路)。这将产生一个通用插件,可直接适配现有的特征主干。本文的关键技术创新有两个方面。首先,在编码器设计中,我们不是使用具有固定内核大小的普通卷积,而是提出了动态金字塔卷积(DPConv),它根据给定输入动态选择最合适的内核大小。其次,我们提供了一种自适应双向解码器设计,以最佳地适配基于DPConv的编码器。最显著的亮点是其在特征尺度之间进行路由及其动态收集的能力,使推理过程具有尺度感知能力。因此,本文持续提升了当前的最优性能。代码和数据集均可在https://github.com/wuzhenyubuaa/DPNet上公开获取。