Laveglia Vincenzo, Trentin Edmondo
DINFO, Università di Firenze, Via di S. Marta 3, 50139 Firenze, Italy.
DIISM, Università di Siena, Via Roma 56, 53100 Siena, Italy.
Entropy (Basel). 2023 Apr 28;25(5):733. doi: 10.3390/e25050733.
A major issue in the application of deep learning is the definition of a proper architecture for the learning machine at hand, in such a way that the model is neither excessively large (which results in overfitting the training data) nor too small (which limits the learning and modeling capabilities of the automatic learner). Facing this issue boosted the development of algorithms for automatically growing and pruning the architectures as part of the learning process. The paper introduces a novel approach to growing the architecture of deep neural networks, called downward-growing neural network (DGNN). The approach can be applied to arbitrary feed-forward deep neural networks. Groups of neurons that negatively affect the performance of the network are selected and grown with the aim of improving the learning and generalization capabilities of the resulting machine. The growing process is realized via replacement of these groups of neurons with sub-networks that are trained relying on ad hoc target propagation techniques. In so doing, the growth process takes place simultaneously in both the depth and width of the DGNN architecture. We assess empirically the effectiveness of the DGNN on several UCI datasets, where the DGNN significantly improves the average accuracy over a range of established deep neural network approaches and over two popular growing algorithms, namely, the AdaNet and the cascade correlation neural network.
深度学习应用中的一个主要问题是为手头的学习机器定义合适的架构,使得模型既不过大(这会导致对训练数据的过拟合)也不过小(这会限制自动学习者的学习和建模能力)。面对这个问题推动了作为学习过程一部分的自动扩展和修剪架构的算法的发展。本文介绍了一种扩展深度神经网络架构的新方法,称为向下生长神经网络(DGNN)。该方法可应用于任意前馈深度神经网络。选择对网络性能有负面影响的神经元组并进行扩展,目的是提高所得机器的学习和泛化能力。扩展过程通过用依赖于特殊目标传播技术训练的子网络替换这些神经元组来实现。这样做时,扩展过程在DGNN架构的深度和宽度上同时进行。我们通过实验评估了DGNN在几个UCI数据集上的有效性,在这些数据集上,DGNN相对于一系列既定的深度神经网络方法以及两种流行的扩展算法,即AdaNet和级联相关神经网络,显著提高了平均准确率。