Jiang Xiaoheng, Pang Yanwei, Sun Manli, Li Xuelong
IEEE Trans Neural Netw Learn Syst. 2018 Jul;29(7):2684-2694. doi: 10.1109/TNNLS.2017.2689098. Epub 2017 May 12.
Conventional convolutional neural networks use either a linear or a nonlinear filter to extract features from an image patch (region) of spatial size (typically, is small and is equal to , e.g., is 5 or 7). Generally, the size of the filter is equal to the size of the input patch. We argue that the representational ability of equal-size strategy is not strong enough. To overcome the drawback, we propose to use subpatch filter whose spatial size is smaller than . The proposed subpatch filter consists of two subsequent filters. The first one is a linear filter of spatial size and is aimed at extracting features from spatial domain. The second one is of spatial size and is used for strengthening the connection between different input feature channels and for reducing the number of parameters. The subpatch filter convolves with the input patch and the resulting network is called a subpatch network. Taking the output of one subpatch network as input, we further repeat constructing subpatch networks until the output contains only one neuron in spatial domain. These subpatch networks form a new network called the cascaded subpatch network (CSNet). The feature layer generated by CSNet is called the csconv layer. For the whole input image, we construct a deep neural network by stacking a sequence of csconv layers. Experimental results on five benchmark data sets demonstrate the effectiveness and compactness of the proposed CSNet. For example, our CSNet reaches a test error of 5.68% on the CIFAR10 data set without model averaging. To the best of our knowledge, this is the best result ever obtained on the CIFAR10 data set.
传统卷积神经网络使用线性或非线性滤波器从空间大小的图像块(区域)中提取特征(通常,该空间大小较小且等于 ,例如为 5 或 7)。一般来说,滤波器的大小等于输入图像块的大小。我们认为等大小策略的表征能力不够强。为了克服这一缺点,我们建议使用空间大小小于 的子图像块滤波器。所提出的子图像块滤波器由两个连续的滤波器组成。第一个是空间大小为 的线性滤波器,旨在从空间域中提取特征。第二个的空间大小为 ,用于加强不同输入特征通道之间的连接并减少参数数量。子图像块滤波器与输入图像块进行卷积,得到网络被称为子图像块网络。将一个子图像块网络的输出作为输入,我们进一步重复构建子图像块网络,直到输出在空间域中仅包含一个神经元。这些子图像块网络形成一个名为级联子图像块网络(CSNet)的新网络。由 CSNet 生成的特征层称为 csconv 层。对于整个输入图像,我们通过堆叠一系列 csconv 层来构建深度神经网络。在五个基准数据集上的实验结果证明了所提出的 CSNet 的有效性和紧凑性。例如,我们的 CSNet 在 CIFAR10 数据集上在不进行模型平均的情况下达到了 5.68% 的测试误差。据我们所知,这是在 CIFAR10 数据集上获得的最好结果。