Department of Electrical and Computer Engineering, Clarkson University, Potsdam, New York 13699, USA and Clarkson Center for Complex Systems Science (C3S2), Potsdam, New York 13699, USA.
Chaos. 2021 Jan;31(1):013108. doi: 10.1063/5.0024890.
Machine learning has become a widely popular and successful paradigm, especially in data-driven science and engineering. A major application problem is data-driven forecasting of future states from a complex dynamical system. Artificial neural networks have evolved as a clear leader among many machine learning approaches, and recurrent neural networks are considered to be particularly well suited for forecasting dynamical systems. In this setting, the echo-state networks or reservoir computers (RCs) have emerged for their simplicity and computational complexity advantages. Instead of a fully trained network, an RC trains only readout weights by a simple, efficient least squares method. What is perhaps quite surprising is that nonetheless, an RC succeeds in making high quality forecasts, competitively with more intensively trained methods, even if not the leader. There remains an unanswered question as to why and how an RC works at all despite randomly selected weights. To this end, this work analyzes a further simplified RC, where the internal activation function is an identity function. Our simplification is not presented for the sake of tuning or improving an RC, but rather for the sake of analysis of what we take to be the surprise being not that it does not work better, but that such random methods work at all. We explicitly connect the RC with linear activation and linear readout to well developed time-series literature on vector autoregressive (VAR) averages that includes theorems on representability through the Wold theorem, which already performs reasonably for short-term forecasts. In the case of a linear activation and now popular quadratic readout RC, we explicitly connect to a nonlinear VAR, which performs quite well. Furthermore, we associate this paradigm to the now widely popular dynamic mode decomposition; thus, these three are in a sense different faces of the same thing. We illustrate our observations in terms of popular benchmark examples including Mackey-Glass differential delay equations and the Lorenz63 system.
机器学习已经成为一种广泛流行且成功的范例,尤其在数据驱动的科学和工程领域。一个主要的应用问题是从复杂动力系统中数据驱动地预测未来状态。人工神经网络在众多机器学习方法中脱颖而出,成为明显的领导者,而递归神经网络被认为特别适合于预测动力系统。在这种情况下,回声状态网络或储层计算机 (RC) 因其简单性和计算复杂性优势而脱颖而出。RC 只通过简单、高效的最小二乘法训练读取权重,而不是完全训练的网络。也许令人惊讶的是,尽管权重是随机选择的,但 RC 仍然能够成功地进行高质量的预测,与经过更密集训练的方法竞争,即使不是领导者。尽管随机选择权重,但 RC 为何以及如何工作仍然是一个悬而未决的问题。为此,这项工作分析了进一步简化的 RC,其中内部激活函数是恒等函数。我们的简化不是为了调整或改进 RC,而是为了分析我们认为令人惊讶的不是它不能更好地工作,而是这种随机方法根本无法工作。我们明确地将 RC 与线性激活和线性读取联系起来,与关于向量自回归 (VAR) 平均值的完善时间序列文献联系起来,其中包括通过沃尔德定理表示的定理,该定理已经可以对短期预测进行合理的预测。在线性激活的情况下,现在流行的二次读取 RC,我们明确地将其与非线性 VAR 联系起来,后者表现得相当好。此外,我们将这种范例与现在广泛流行的动态模式分解联系起来;因此,这三者在某种意义上是同一事物的不同方面。我们以包括 Mackey-Glass 微分延迟方程和 Lorenz63 系统在内的流行基准示例来说明我们的观察结果。