Mazzia Vittorio, Salvetti Francesco, Chiaberge Marcello
Department of Electronics and Telecommunications, Politecnico di Torino, 10129, Turin, Italy.
PIC4SeR, Politecnico di Torino Interdepartmental Centre for Service Robotics, Turin, Italy.
Sci Rep. 2021 Jul 19;11(1):14634. doi: 10.1038/s41598-021-93977-0.
Deep convolutional neural networks, assisted by architectural design strategies, make extensive use of data augmentation techniques and layers with a high number of feature maps to embed object transformations. That is highly inefficient and for large datasets implies a massive redundancy of features detectors. Even though capsules networks are still in their infancy, they constitute a promising solution to extend current convolutional networks and endow artificial visual perception with a process to encode more efficiently all feature affine transformations. Indeed, a properly working capsule network should theoretically achieve higher results with a considerably lower number of parameters count due to intrinsic capability to generalize to novel viewpoints. Nevertheless, little attention has been given to this relevant aspect. In this paper, we investigate the efficiency of capsule networks and, pushing their capacity to the limits with an extreme architecture with barely 160 K parameters, we prove that the proposed architecture is still able to achieve state-of-the-art results on three different datasets with only 2% of the original CapsNet parameters. Moreover, we replace dynamic routing with a novel non-iterative, highly parallelizable routing algorithm that can easily cope with a reduced number of capsules. Extensive experimentation with other capsule implementations has proved the effectiveness of our methodology and the capability of capsule networks to efficiently embed visual representations more prone to generalization.
深度卷积神经网络在架构设计策略的辅助下,大量使用数据增强技术以及具有大量特征图的层来嵌入对象变换。这效率极低,对于大型数据集而言意味着特征检测器存在大量冗余。尽管胶囊网络仍处于起步阶段,但它们是扩展当前卷积网络并赋予人工视觉感知一种更高效地编码所有特征仿射变换的过程的一个有前途的解决方案。实际上,一个正常工作的胶囊网络理论上应该能够以相当少的参数数量取得更高的结果,这归因于其泛化到新视角的内在能力。然而,这个相关方面几乎没有受到关注。在本文中,我们研究了胶囊网络的效率,并通过一个仅有约16万个参数的极端架构将其能力推向极限,我们证明所提出的架构仅用原始胶囊网络(CapsNet)2%的参数就能在三个不同数据集上取得当前最优的结果。此外,我们用一种新颖的非迭代、高度可并行化的路由算法取代动态路由,该算法能够轻松应对减少的胶囊数量。对其他胶囊实现方式的大量实验证明了我们方法的有效性以及胶囊网络有效嵌入更易于泛化视觉表示的能力。