Lee Sangwon, Periwal Vipul, Jo Junghyo
Department of Physics and Astronomy, Seoul National University, Seoul 08826, Korea.
Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA.
Phys Rev E. 2021 Aug;104(2-1):024119. doi: 10.1103/PhysRevE.104.024119.
Inferring dynamics from time series is an important objective in data analysis. In particular, it is challenging to infer stochastic dynamics given incomplete data. We propose an expectation maximization (EM) algorithm that iterates between alternating two steps: E-step restores missing data points, while M-step infers an underlying network model from the restored data. Using synthetic data of a kinetic Ising model, we confirm that the algorithm works for restoring missing data points as well as inferring the underlying model. At the initial iteration of the EM algorithm, the model inference shows better model-data consistency with observed data points than with missing data points. As we keep iterating, however, missing data points show better model-data consistency. We find that demanding equal consistency of observed and missing data points provides an effective stopping criterion for the iteration to prevent going beyond the most accurate model inference. Using the EM algorithm and the stopping criterion together, we infer missing data points from a time-series data of real neuronal activities. Our method reproduces collective properties of neuronal activities such as correlations and firing statistics even when 70% of data points are masked as missing points.
从时间序列中推断动态是数据分析中的一个重要目标。特别是,在数据不完整的情况下推断随机动态具有挑战性。我们提出了一种期望最大化(EM)算法,该算法在交替的两个步骤之间迭代:E步恢复缺失的数据点,而M步从恢复的数据中推断潜在的网络模型。使用动力学伊辛模型的合成数据,我们证实该算法适用于恢复缺失的数据点以及推断潜在模型。在EM算法的初始迭代中,模型推断显示,与缺失数据点相比,模型与观测数据点具有更好的模型-数据一致性。然而,随着我们不断迭代,缺失数据点显示出更好的模型-数据一致性。我们发现,要求观测数据点和缺失数据点具有相同的一致性,为迭代提供了一个有效的停止标准,以防止超出最准确的模型推断。结合使用EM算法和停止标准,我们从真实神经元活动的时间序列数据中推断出缺失的数据点。即使70%的数据点被掩盖为缺失点,我们的方法也能再现神经元活动的集体特性,如相关性和放电统计。