Faculty of Information Technology, University of Jyväskylä, Finland.
Department of Artificial Intelligence, Kharkiv National University of Radio Electronics, Ukraine.
Neural Netw. 2022 Nov;155:177-203. doi: 10.1016/j.neunet.2022.08.017. Epub 2022 Aug 23.
Convolutional Neural Network is one of the famous members of the deep learning family of neural network architectures, which is used for many purposes, including image classification. In spite of the wide adoption, such networks are known to be highly tuned to the training data (samples representing a particular problem), and they are poorly reusable to address new problems. One way to change this would be, in addition to trainable weights, to apply trainable parameters of the mathematical functions, which simulate various neural computations within such networks. In this way, we may distinguish between the narrowly focused task-specific parameters (weights) and more generic capability-specific parameters. In this paper, we suggest a couple of flexible mathematical functions (Generalized Lehmer Mean and Generalized Power Mean) with trainable parameters to replace some fixed operations (such as ordinary arithmetic mean or simple weighted aggregation), which are traditionally used within various components of a convolutional neural network architecture. We named the overall architecture with such an update as a hyper-flexible convolutional neural network. We provide mathematical justification of various components of such architecture and experimentally show that it performs better than the traditional one, including better robustness regarding the adversarial perturbations of testing data.
卷积神经网络是深度学习家族的神经网络架构之一,它被广泛应用于多种目的,包括图像分类。尽管已经广泛应用,但这些网络被认为高度依赖于训练数据(代表特定问题的样本),并且难以重新用于解决新问题。改变这种情况的一种方法是,除了可训练的权重外,还应用可训练的数学函数参数,这些参数模拟了这些网络内部的各种神经计算。通过这种方式,我们可以区分专门针对特定任务的参数(权重)和更通用的特定于功能的参数。在本文中,我们建议使用几个具有可训练参数的灵活数学函数(广义勒美均值和广义幂均值)来替代传统上在卷积神经网络架构的各个组件中使用的固定操作(例如普通算术平均值或简单加权聚合)。我们将具有这种更新的整体架构命名为超灵活卷积神经网络。我们提供了这种架构的各个组件的数学证明,并通过实验表明,它比传统的卷积神经网络表现更好,包括对测试数据的对抗性扰动具有更好的鲁棒性。