Zhang Yuanzhao, Roque Dos Santos Edmilson, Zhang Huixin, Cornelius Sean P
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA.
Department of Electrical and Computer Engineering, Clarkson University, Potsdam, New York 13699, USA.
Chaos. 2025 Jul 1;35(7). doi: 10.1063/5.0262977.
It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC)-a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned "integrator" and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.
最近发现,与直觉相反,更多的数据可能会损害深度神经网络的性能。在此,我们表明,在动力系统的数据驱动模型中会出现这种现象的更极端版本。为了阐明其潜在机制,我们聚焦于下一代储层计算(NGRC)——一种从数据中学习动力学的流行框架。我们发现,尽管使用更多训练数据能更好地学习流映射的表示,但NGRC可能会采用病态的“积分器”并失去稳定性。我们将这种数据诱导的不稳定性与NGRC中延迟状态所产生的辅助维度联系起来。基于这些发现,我们提出了简单的策略来减轻不稳定性,要么与数据量同步增加正则化强度,要么在训练期间谨慎引入噪声。我们的结果凸显了在动力系统的数据驱动建模中适当正则化的重要性。