Cueto-Mendoza Eduardo, Kelleher John
School of Computer Science, TU Dublin, Grangegorman, Dublin 7, D07H6K8 Co. Dublin Ireland.
ADAPT Research Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin 2, Co. Dublin Ireland.
Artif Intell Rev. 2024;57(12):349. doi: 10.1007/s10462-024-10943-8. Epub 2024 Oct 28.
Measuring Efficiency in neural network system development is an open research problem. This paper presents an experimental framework to measure the training efficiency of a neural architecture. To demonstrate our approach, we analyze the training efficiency of Convolutional Neural Networks and Bayesian equivalents on the MNIST and CIFAR-10 tasks. Our results show that training efficiency decays as training progresses and varies across different stopping criteria for a given neural model and learning task. We also find a non-linear relationship between training stopping criteria, training Efficiency, model size, and training Efficiency. Furthermore, we illustrate the potential confounding effects of overtraining on measuring the training efficiency of a neural architecture. Regarding relative training efficiency across different architectures, our results indicate that CNNs are more efficient than BCNNs on both datasets. More generally, as a learning task becomes more complex, the relative difference in training efficiency between different architectures becomes more pronounced.
衡量神经网络系统开发中的效率是一个开放的研究问题。本文提出了一个实验框架来衡量神经架构的训练效率。为了演示我们的方法,我们分析了卷积神经网络和贝叶斯等效模型在MNIST和CIFAR - 10任务上的训练效率。我们的结果表明,随着训练的进行,训练效率会下降,并且对于给定的神经模型和学习任务,不同的停止标准下训练效率也会有所不同。我们还发现训练停止标准、训练效率、模型大小和训练效率之间存在非线性关系。此外,我们说明了过度训练对衡量神经架构训练效率的潜在混淆效应。关于不同架构之间的相对训练效率,我们的结果表明,在这两个数据集上,卷积神经网络比贝叶斯卷积神经网络更高效。更一般地说,随着学习任务变得更加复杂,不同架构之间训练效率的相对差异会变得更加明显。