Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA.
J Chem Phys. 2018 Jun 28;148(24):241723. doi: 10.1063/1.5018409.
With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.
随着复杂系统可用数据的快速增加,人们对从大量数据集中提取物理相关信息产生了浓厚的兴趣。最近,一种名为稀疏非线性动力学识别(Sparse Identification of Nonlinear Dynamics,SINDy)的框架被引入,用于从模拟数据中识别动力系统的控制方程。在本研究中,我们将 SINDy 扩展到随机动力系统,该系统常用于模拟生物物理过程。我们证明了随机 SINDy 在无限数据极限下的渐近正确性,无论是在原始变量还是投影变量中。我们讨论了解决实际 SINDy 中出现的稀疏回归问题的算法,并表明交叉验证是确定正确稀疏度的重要工具。我们在两个测试系统上展示了所提出的方法,即一维势中的扩散和二维扩散过程的投影动力学。