Okada Makoto, Yamanishi Kenji, Masuda Naoki
Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo Bunkyo, Tokyo 113-8656, Japan.
Department of Engineering Mathematics, University of Bristol, Woodland Road, Clifton, Bristol BS8 1UB, UK.
R Soc Open Sci. 2020 Feb 26;7(2):191643. doi: 10.1098/rsos.191643. eCollection 2020 Feb.
Inter-event times of various human behaviour are apparently non-Poissonian and obey long-tailed distributions as opposed to exponential distributions, which correspond to Poisson processes. It has been suggested that human individuals may switch between different states, in each of which they are regarded to generate events obeying a Poisson process. If this is the case, inter-event times should approximately obey a mixture of exponential distributions with different parameter values. In the present study, we introduce the minimum description length principle to compare mixtures of exponential distributions with different numbers of components (i.e. constituent exponential distributions). Because these distributions violate the identifiability property, one is mathematically not allowed to apply the Akaike or Bayes information criteria to their maximum-likelihood estimator to carry out model selection. We overcome this theoretical barrier by applying a minimum description principle to joint likelihoods of the data and latent variables. We show that mixtures of exponential distributions with a few components are selected, as opposed to more complex mixtures in various datasets, and that the fitting accuracy is comparable to that of state-of-the-art algorithms to fit power-law distributions to data. Our results lend support to Poissonian explanations of apparently non-Poissonian human behaviour.
各种人类行为的事件间隔时间显然是非泊松分布的,并且服从长尾分布,而不是对应于泊松过程的指数分布。有人提出,人类个体可能在不同状态之间切换,在每种状态下,他们被认为会产生服从泊松过程的事件。如果是这种情况,事件间隔时间应该大致服从具有不同参数值的指数分布的混合。在本研究中,我们引入最小描述长度原则来比较具有不同成分数量(即组成指数分布)的指数分布的混合。由于这些分布违反了可识别性属性,在数学上不允许将赤池或贝叶斯信息准则应用于其最大似然估计器来进行模型选择。我们通过将最小描述原则应用于数据和潜在变量的联合似然来克服这一理论障碍。我们表明,与各种数据集中更复杂的混合相比,选择了具有少量成分的指数分布的混合,并且拟合精度与将幂律分布拟合到数据的最先进算法相当。我们的结果支持了对明显非泊松分布的人类行为的泊松解释。