Vitányi Paul M B, Chater Nick
National Research Institute for Mathematics and Computer Science, CWI, Science Park 123, 1098 XG, Amsterdam, The Netherlands; University of Amsterdam, The Netherlands.
Behavioural Science Group, Warwick Business School, University of Warwick, Coventry, CV4 7AL, UK.
J Math Psychol. 2017 Feb;76(Pt A):13-24. doi: 10.1016/j.jmp.2016.11.004.
Within psychology, neuroscience and artificial intelligence, there has been increasing interest in the proposal that the brain builds probabilistic models of sensory and linguistic input: that is, to infer a probabilistic model from a sample. The practical problems of such inference are substantial: the brain has limited data and restricted computational resources. But there is a more fundamental question: is the problem of inferring a probabilistic model from a sample possible even in principle? We explore this question and find some surprisingly positive and general results. First, for a broad class of probability distributions characterized by computability restrictions, we specify a learning algorithm that will almost surely identify a probability distribution in the limit given a finite i.i.d. sample of sufficient but unknown length. This is similarly shown to hold for sequences generated by a broad class of Markov chains, subject to computability assumptions. The technical tool is the strong law of large numbers. Second, for a large class of dependent sequences, we specify an algorithm which identifies in the limit a computable measure for which the sequence is typical, in the sense of Martin-Löf (there may be more than one such measure). The technical tool is the theory of Kolmogorov complexity. We analyze the associated predictions in both cases. We also briefly consider special cases, including language learning, and wider theoretical implications for psychology.
在心理学、神经科学和人工智能领域,人们对大脑构建感官和语言输入概率模型这一观点的兴趣与日俱增:也就是说,从样本中推断出概率模型。这种推断存在诸多实际问题:大脑的数据有限,计算资源也受限。但还有一个更根本的问题:从样本中推断概率模型的问题在原则上是否可行?我们探讨了这个问题,并得出了一些惊人的积极且普遍的结果。首先,对于一类受可计算性限制的广泛概率分布,我们指定了一种学习算法,在给定足够长但未知长度的独立同分布有限样本的情况下,该算法几乎肯定能在极限情况下识别出概率分布。在满足可计算性假设的条件下,对于由一类广泛的马尔可夫链生成的序列,情况同样如此。技术工具是强大数定律。其次,对于一大类相关序列,我们指定了一种算法,该算法在极限情况下能识别出一种可计算测度,在马丁 - 洛夫意义上,该序列对于此测度是典型的(可能不止一种这样的测度)。技术工具是柯尔莫哥洛夫复杂性理论。我们分析了这两种情况下的相关预测。我们还简要考虑了特殊情况,包括语言学习,以及对心理学更广泛的理论影响。