Verzelli Pietro, Alippi Cesare, Livi Lorenzo
Faculty of Informatics, Università della Svizzera Italiana, Lugano, 69000, Switzerland.
Department of Electronics, Information and bioengineering, Politecnico di Milano, Milan, 20133, Italy.
Sci Rep. 2019 Sep 25;9(1):13887. doi: 10.1038/s41598-019-50158-4.
Among the various architectures of Recurrent Neural Networks, Echo State Networks (ESNs) emerged due to their simplified and inexpensive training procedure. These networks are known to be sensitive to the setting of hyper-parameters, which critically affect their behavior. Results show that their performance is usually maximized in a narrow region of hyper-parameter space called edge of criticality. Finding such a region requires searching in hyper-parameter space in a sensible way: hyper-parameter configurations marginally outside such a region might yield networks exhibiting fully developed chaos, hence producing unreliable computations. The performance gain due to optimizing hyper-parameters can be studied by considering the memory-nonlinearity trade-off, i.e., the fact that increasing the nonlinear behavior of the network degrades its ability to remember past inputs, and vice-versa. In this paper, we propose a model of ESNs that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behavior in phase space characterized by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking.
在递归神经网络的各种架构中,回声状态网络(ESN)因其简化且成本低廉的训练过程而出现。众所周知,这些网络对超参数的设置很敏感,超参数设置会严重影响其行为。结果表明,它们的性能通常在超参数空间的一个狭窄区域(称为临界边缘)内达到最大化。找到这样一个区域需要以合理的方式在超参数空间中进行搜索:稍微超出该区域的超参数配置可能会产生表现出完全发展的混沌的网络,从而导致不可靠的计算。通过考虑记忆 - 非线性权衡,可以研究优化超参数带来的性能提升,即网络非线性行为的增加会降低其记忆过去输入的能力,反之亦然。在本文中,我们提出了一种回声状态网络模型,该模型消除了对超参数的关键依赖,从而得到可证明不会进入混沌状态的网络,并且同时在相空间中表现出非线性行为,其特征是对过去输入有大量记忆,这与线性网络相当。我们的理论发现得到了实验的支持,实验表明所提出的模型展示出足够丰富的动态特性,能够近似许多用于基准测试的常见非线性系统。