Abend Omri, Kwiatkowski Tom, Smith Nathaniel J, Goldwater Sharon, Steedman Mark
Informatics, University of Edinburgh, United Kingdom.
Informatics, University of Edinburgh, United Kingdom.
Cognition. 2017 Jul;164:116-143. doi: 10.1016/j.cognition.2017.02.009. Epub 2017 Apr 13.
The semantic bootstrapping hypothesis proposes that children acquire their native language through exposure to sentences of the language paired with structured representations of their meaning, whose component substructures can be associated with words and syntactic structures used to express these concepts. The child's task is then to learn a language-specific grammar and lexicon based on (probably contextually ambiguous, possibly somewhat noisy) pairs of sentences and their meaning representations (logical forms). Starting from these assumptions, we develop a Bayesian probabilistic account of semantically bootstrapped first-language acquisition in the child, based on techniques from computational parsing and interpretation of unrestricted text. Our learner jointly models (a) word learning: the mapping between components of the given sentential meaning and lexical words (or phrases) of the language, and (b) syntax learning: the projection of lexical elements onto sentences by universal construction-free syntactic rules. Using an incremental learning algorithm, we apply the model to a dataset of real syntactically complex child-directed utterances and (pseudo) logical forms, the latter including contextually plausible but irrelevant distractors. Taking the Eve section of the CHILDES corpus as input, the model simulates several well-documented phenomena from the developmental literature. In particular, the model exhibits syntactic bootstrapping effects (in which previously learned constructions facilitate the learning of novel words), sudden jumps in learning without explicit parameter setting, acceleration of word-learning (the "vocabulary spurt"), an initial bias favoring the learning of nouns over verbs, and one-shot learning of words and their meanings. The learner thus demonstrates how statistical learning over structured representations can provide a unified account for these seemingly disparate phenomena.
语义启动假说提出,儿童通过接触与意义的结构化表征配对的母语句子来习得母语,这些意义表征的组成子结构可以与用于表达这些概念的单词和句法结构相关联。儿童的任务就是基于(可能在语境上有歧义、可能有点嘈杂的)句子及其意义表征(逻辑形式)对来学习特定语言的语法和词汇。基于这些假设,我们利用计算句法分析和无限制文本解释的技术,为儿童语义启动式的第一语言习得建立了一个贝叶斯概率模型。我们的学习者联合对(a)单词学习进行建模:给定句子意义的组成部分与语言中的词汇单词(或短语)之间的映射,以及(b)句法学习进行建模:通过通用的无构造句法规则将词汇元素投射到句子上。我们使用一种增量学习算法,将该模型应用于一个由真实的句法复杂的儿童导向话语和(伪)逻辑形式组成的数据集,后者包括在语境上看似合理但不相关的干扰项。以儿童语言数据交换系统语料库的夏娃部分为输入,该模型模拟了发展文献中几个有充分记录的现象。特别是,该模型展示了句法启动效应(先前学到的结构促进新单词的学习)、在没有明确参数设置的情况下学习的突然跳跃、单词学习的加速(“词汇爆发”)、最初倾向于学习名词而非动词的偏好,以及单词及其意义的一次性学习。因此,学习者展示了对结构化表征进行统计学习如何能够为这些看似不同的现象提供一个统一的解释。