Niyogi P, Berwick R C
Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge 02142, USA.
Cognition. 1996 Oct-Nov;61(1-2):161-93. doi: 10.1016/s0010-0277(96)00718-4.
This paper shows how to formally characterize language learning in a finite parameter space, for instance, in the principles-and-parameters approach to language, as a Markov structure. New language learning results follow directly; we can explicitly calculate how many positive examples on average ("sample complexity") it will take for a learner to correctly identify a target language with high probability. We show how sample complexity varies with input distributions and learning regimes. In particular we find that the average time to converge under reasonable language input distributions for a simple three-parameter system first described by Gibson and Wexler (1994) is psychologically plausible, in the range of 100-150 positive examples. We further find that a simple random step algorithm-that is, simply jumping from one language hypothesis to another rather than changing one parameter at a time-works faster and always converges to the right target language, in contrast to the single-step, local parameter setting method advocated in some recent work.
本文展示了如何在有限参数空间中对语言学习进行形式化刻画,例如,在语言的原则与参数方法中,将其刻画为一种马尔可夫结构。新的语言学习结果随之而来;我们可以明确计算出学习者以高概率正确识别目标语言平均需要多少个正例(“样本复杂度”)。我们展示了样本复杂度如何随输入分布和学习模式而变化。特别地,我们发现,对于吉布森和韦克斯勒(1994年)首次描述的一个简单的三参数系统,在合理的语言输入分布下收敛的平均时间在心理上是合理的,处于100 - 150个正例的范围内。我们还进一步发现,一种简单的随机步长算法——即简单地从一个语言假设跳到另一个语言假设,而不是一次改变一个参数——比一些近期工作中倡导的单步局部参数设置方法收敛得更快,并且总能收敛到正确的目标语言。