Institute for Logic, Language and Computation, University of Amsterdam.
Cogn Sci. 2009 Jul;33(5):752-93. doi: 10.1111/j.1551-6709.2009.01031.x. Epub 2009 Apr 8.
While rules and exemplars are usually viewed as opposites, this paper argues that they form end points of the same distribution. By representing both rules and exemplars as (partial) trees, we can take into account the fluid middle ground between the two extremes. This insight is the starting point for a new theory of language learning that is based on the following idea: If a language learner does not know which phrase-structure trees should be assigned to initial sentences, s/he allows (implicitly) for all possible trees and lets linguistic experience decide which is the "best" tree for each sentence. The best tree is obtained by maximizing "structural analogy" between a sentence and previous sentences, which is formalized by the most probable shortest combination of subtrees from all trees of previous sentences. Corpus-based experiments with this model on the Penn Treebank and the Childes database indicate that it can learn both exemplar-based and rule-based aspects of language, ranging from phrasal verbs to auxiliary fronting. By having learned the syntactic structures of sentences, we have also learned the grammar implicit in these structures, which can in turn be used to produce new sentences. We show that our model mimicks children's language development from item-based constructions to abstract constructions, and that the model can simulate some of the errors made by children in producing complex questions.
虽然规则和范例通常被视为对立面,但本文认为它们形成了同一分布的两个极端。通过将规则和范例都表示为(部分)树,我们可以考虑到两者之间的中间地带。这种观点是一种新的语言学习理论的起点,该理论基于以下思想:如果语言学习者不知道应该将哪些短语结构树分配给初始句子,那么他/她允许(隐式地)所有可能的树,并让语言经验决定每个句子的“最佳”树。最佳树是通过最大化句子与前一个句子之间的“结构相似性”来获得的,这是通过从所有前一个句子的树中最可能的最短子树组合来形式化的。基于 Penn Treebank 和 Childes 数据库的对该模型的语料库实验表明,它可以学习语言的范例和规则方面,从短语动词到辅助前置。通过学习句子的句法结构,我们也学习了这些结构中隐含的语法,这些语法反过来又可以用来生成新句子。我们表明,我们的模型模拟了儿童从基于项目的结构到抽象结构的语言发展,并且模型可以模拟儿童在生成复杂问题时所犯的一些错误。