Suppr超能文献

通过从少量已知词汇进行自展学习来获取名词和动词类别:一个计算模型

The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model.

作者信息

Brusini Perrine, Seminck Olga, Amsili Pascal, Christophe Anne

机构信息

Department of Psychological Sciences, University of Liverpool, Liverpool, United Kingdom.

Laboratoire de Sciences Cognitives et Psycholinguistique, Centre National de la Recherche Scientifique, École Normale Supérieure/PSL University, Paris, France.

出版信息

Front Psychol. 2021 Aug 19;12:661479. doi: 10.3389/fpsyg.2021.661479. eCollection 2021.

Abstract

While many studies have shown that toddlers are able to detect syntactic regularities in speech, the learning mechanism allowing them to do this is still largely unclear. In this article, we use computational modeling to assess the plausibility of a context-based learning mechanism for the acquisition of nouns and verbs. We hypothesize that infants can assign basic semantic features, such as "is-an-object" and/or "is-an-action," to the very first words they learn, then use these words, the , to ground proto-categories of nouns and verbs. The contexts in which these words occur, would then be exploited to bootstrap the noun and verb categories: unknown words are attributed to the class that has been observed most frequently in the corresponding context. To test our hypothesis, we designed a series of computational experiments which used French corpora of child-directed speech and different sizes of semantic seed. We partitioned these corpora in training and test sets: the model extracted the two-word contexts of the seed from the training sets, then used them to predict the syntactic category of content words from the test sets. This very simple algorithm demonstrated to be highly efficient in a categorization task: even the smallest semantic seed (only 8 nouns and 1 verb known) yields a very high precision (~90% of new nouns; ~80% of new verbs). Recall, in contrast, was low for small seeds, and increased with the seed size. Interestingly, we observed that the contexts used most often by the model featured function words, which is in line with what we know about infants' language development. Crucially, for the learning method we evaluated here, all initialization hypotheses are plausible and fit the developmental literature (semantic seed and ability to analyse contexts). While this experiment cannot prove that this learning mechanism is indeed used by infants, it demonstrates the feasibility of a realistic learning hypothesis, by using an algorithm that relies on very little computational and memory resources. Altogether, this supports the idea that a probabilistic, context-based mechanism can be very efficient for the acquisition of syntactic categories in infants.

摘要

虽然许多研究表明幼儿能够察觉言语中的句法规律,但让他们做到这一点的学习机制在很大程度上仍不明确。在本文中,我们使用计算建模来评估一种基于语境的学习机制对于名词和动词习得的合理性。我们假设婴儿能够为他们所学的最初的单词赋予基本的语义特征,比如“是一个物体”和/或“是一个动作”,然后使用这些单词来建立名词和动词的原始类别。这些单词出现的语境随后会被用来引导名词和动词类别:未知单词会被归为在相应语境中出现最频繁的类别。为了验证我们的假设,我们设计了一系列计算实验,这些实验使用了面向儿童的法语语料库和不同规模的语义种子。我们将这些语料库划分为训练集和测试集:模型从训练集中提取种子的双词语境,然后用它们来预测测试集中实词的句法类别。这个非常简单的算法在分类任务中被证明是高效的:即使是最小的语义种子(仅8个已知名词和1个已知动词)也能产生非常高的精度(约90%的新名词;约80%的新动词)。相比之下,对于小种子,召回率较低,且随着种子规模的增加而提高。有趣的是,我们观察到模型最常使用的语境以功能词为特征,这与我们对婴儿语言发展的了解是一致的。至关重要的是,对于我们在此评估的学习方法,所有初始化假设都是合理的,并且符合发展文献(语义种子和分析语境的能力)。虽然这个实验不能证明婴儿确实使用了这种学习机制,但它通过使用一种依赖极少计算和内存资源的算法,证明了一个现实的学习假设的可行性。总之,这支持了这样一种观点,即一种基于概率和语境的机制对于婴儿句法类别的习得可能非常有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c54e/8416756/b6111342c2ad/fpsyg-12-661479-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验