IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2416-2425. doi: 10.1109/TPAMI.2020.3041871. Epub 2022 Apr 1.
We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and channel-wise information contained in the input tensor. The DiCE unit is simple and can be seamlessly integrated with any architecture to improve its efficiency and performance. Compared to depth-wise separable convolutions, the DiCE unit shows significant improvements across different architectures. When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation. On the ImageNet dataset, the DiCENet delivers 2-4 percent higher accuracy than state-of-the-art manually designed models (e.g., MobileNetv2 and ShuffleNetv2). Also, DiCENet generalizes better to tasks (e.g., object detection) that are often used in resource-constrained devices in comparison to state-of-the-art separable convolution-based efficient networks, including neural search-based methods (e.g., MobileNetv3 and MixNet).
我们引入了一种新颖且通用的卷积单元,即 DiCE 单元,它是使用逐维卷积和逐维融合构建的。逐维卷积在输入张量的每个维度上应用轻量级卷积滤波,而逐维融合则有效地组合这些逐维表示,使 DiCE 单元能够有效地编码输入张量中包含的空间和通道信息。DiCE 单元简单易用,可以无缝集成到任何架构中,以提高其效率和性能。与深度可分离卷积相比,DiCE 单元在不同的架构中都有显著的改进。当 DiCE 单元堆叠起来构建 DiCENet 模型时,我们观察到在各种计算机视觉任务(包括图像分类、目标检测和语义分割)中,它比最先进的模型有显著的提高。在 ImageNet 数据集上,DiCENet 比最先进的手动设计模型(例如 MobileNetv2 和 ShuffleNetv2)的准确率高出 2-4 个百分点。此外,与最先进的基于分离卷积的高效网络(包括基于神经搜索的方法,如 MobileNetv3 和 MixNet)相比,DiCENet 在资源受限的设备中经常使用的任务(例如目标检测)上具有更好的泛化能力。