Campajola Carlo, Lillo Fabrizio, Tantari Daniele
Scuola Normale Superiore di Pisa, piazza dei Cavalieri 7, 56126 Pisa, Italy.
University of Bologna - Department of Mathematics, piazza di Porta San Donato 5, 40126 Bologna, Italy.
Phys Rev E. 2019 Jun;99(6-1):062138. doi: 10.1103/PhysRevE.99.062138.
We consider the problem of inferring a causality structure from multiple binary time series by using the kinetic Ising model in datasets where a fraction of observations is missing. Inspired by recent work on mean field methods for the inference of the model with hidden spins, we develop a pseudo-expectation-maximization algorithm that is able to work even in conditions of severe data sparsity. The methodology relies on the Martin-Siggia-Rose path integral method with second-order saddle-point solution to make it possible to approximate the log-likelihood in polynomial time, giving as output an estimate of the couplings matrix and of the missing observations. We also propose a recursive version of the algorithm, where at every iteration some missing values are substituted by their maximum-likelihood estimate, showing that the method can be used together with sparsification schemes such as lasso regularization or decimation. We test the performance of the algorithm on synthetic data and find interesting properties regarding the dependency on heterogeneity of the observation frequency of spins and when some of the hypotheses that are necessary to the saddle-point approximation are violated, such as the small couplings limit and the assumption of statistical independence between couplings.
我们考虑在部分观测值缺失的数据集中,通过使用动力学伊辛模型从多个二元时间序列推断因果结构的问题。受近期关于具有隐藏自旋的模型推断的平均场方法的工作启发,我们开发了一种伪期望最大化算法,该算法即使在数据严重稀疏的条件下也能工作。该方法依赖于具有二阶鞍点解的马丁 - 西格西亚 - 罗斯路径积分方法,以便能够在多项式时间内近似对数似然,输出耦合矩阵和缺失观测值的估计值。我们还提出了该算法的递归版本,其中在每次迭代时,一些缺失值被其最大似然估计值替代,表明该方法可以与诸如套索正则化或抽取等稀疏化方案一起使用。我们在合成数据上测试了该算法的性能,并发现了关于自旋观测频率异质性的依赖性以及鞍点近似所需的一些假设被违反(如小耦合极限和耦合之间统计独立性的假设)时的有趣性质。