Abarbanel Henry D I, Rozdeba Paul J, Shirman Sasha
Marine Physical Laboratory, Scripps Institution of Oceanography, and Department of Physics, University of California, San Diego, La Jolla, CA 92093-0374, U.S.A.
Department of Physics, University of California, San Diego, La Jolla, CA 92093-0374, U.S.A.
Neural Comput. 2018 Aug;30(8):2025-2055. doi: 10.1162/neco_a_01094. Epub 2018 Jun 12.
We formulate an equivalence between machine learning and the formulation of statistical data assimilation as used widely in physical and biological sciences. The correspondence is that layer number in a feedforward artificial network setting is the analog of time in the data assimilation setting. This connection has been noted in the machine learning literature. We add a perspective that expands on how methods from statistical physics and aspects of Lagrangian and Hamiltonian dynamics play a role in how networks can be trained and designed. Within the discussion of this equivalence, we show that adding more layers (making the network deeper) is analogous to adding temporal resolution in a data assimilation framework. Extending this equivalence to recurrent networks is also discussed. We explore how one can find a candidate for the global minimum of the cost functions in the machine learning context using a method from data assimilation. Calculations on simple models from both sides of the equivalence are reported. Also discussed is a framework in which the time or layer label is taken to be continuous, providing a differential equation, the Euler-Lagrange equation and its boundary conditions, as a necessary condition for a minimum of the cost function. This shows that the problem being solved is a two-point boundary value problem familiar in the discussion of variational methods. The use of continuous layers is denoted "deepest learning." These problems respect a symplectic symmetry in continuous layer phase space. Both Lagrangian versions and Hamiltonian versions of these problems are presented. Their well-studied implementation in a discrete time/layer, while respecting the symplectic structure, is addressed. The Hamiltonian version provides a direct rationale for backpropagation as a solution method for a certain two-point boundary value problem.
我们阐述了机器学习与物理和生物科学中广泛使用的统计数据同化公式之间的等价关系。对应关系是,前馈人工网络设置中的层数类似于数据同化设置中的时间。这种联系在机器学习文献中已有提及。我们补充了一个观点,即统计物理方法以及拉格朗日和哈密顿动力学的各个方面如何在网络的训练和设计中发挥作用。在讨论这种等价关系时,我们表明增加更多层(使网络更深)类似于在数据同化框架中增加时间分辨率。还讨论了将这种等价关系扩展到递归网络的情况。我们探索如何使用数据同化方法在机器学习背景下找到成本函数全局最小值的候选值。报告了对等价关系两侧简单模型的计算。还讨论了一个框架,其中时间或层标签被视为连续的,从而提供一个微分方程、欧拉 - 拉格朗日方程及其边界条件,作为成本函数最小值的必要条件。这表明正在解决的问题是变分方法讨论中熟悉的两点边值问题。连续层的使用被称为“深度最深学习”。这些问题在连续层相空间中遵循辛对称性。给出了这些问题的拉格朗日版本和哈密顿版本。讨论了它们在离散时间/层中经过充分研究的实现方式,同时尊重辛结构。哈密顿版本为反向传播作为解决特定两点边值问题的方法提供了直接的理论依据。