Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy.
Sci Rep. 2020 Dec 7;10(1):21334. doi: 10.1038/s41598-020-76517-0.
Understanding the inner behaviour of multilayer perceptrons during and after training is a goal of paramount importance for many researchers worldwide. This article experimentally shows that relevant patterns emerge upon training, which are typically related to the underlying problem difficulty. The occurrence of these patterns is highlighted by means of [Formula: see text] diagrams, a 2D graphical tool originally devised to support the work of researchers on classifier performance evaluation and on feature assessment. The underlying assumption being that multilayer perceptrons are powerful engines for feature encoding, hidden layers have been inspected as they were in fact hosting new input features. Interestingly, there are problems that appear difficult if dealt with using a single hidden layer, whereas they turn out to be easier upon the addition of further layers. The experimental findings reported in this article give further support to the standpoint according to which implementing neural architectures with multiple layers may help to boost their generalisation ability. A generic training strategy inspired by some relevant recommendations of deep learning has also been devised. A basic implementation of this strategy has been thoroughly used during the experiments aimed at identifying relevant patterns inside multilayer perceptrons. Further experiments performed in a comparative setting have shown that it could be adopted as viable alternative to the classical backpropagation algorithm.
了解多层感知机在训练期间和训练后的内部行为,是全球许多研究人员的首要目标。本文通过实验表明,相关模式在训练过程中出现,这些模式通常与底层问题的难度有关。这些模式的出现是通过[公式:见正文]图来突出的,这是一种二维图形工具,最初设计用于支持分类器性能评估和特征评估方面的研究人员的工作。假设多层感知机是强大的特征编码引擎,因此检查了隐藏层,因为它们实际上承载了新的输入特征。有趣的是,有些问题如果只用单个隐藏层来处理会显得很困难,但如果再增加更多的层,这些问题就会变得更容易解决。本文报告的实验结果进一步支持了这样一种观点,即采用具有多个层的神经网络架构可能有助于提高它们的泛化能力。还设计了一种受深度学习相关建议启发的通用训练策略。在旨在识别多层感知机内部相关模式的实验中,彻底使用了这种策略的基本实现。在比较环境中进行的进一步实验表明,它可以作为经典反向传播算法的可行替代方案。