Center for Data Science, New York University.
Department of Psychology, New York University.
Cogn Sci. 2022 Apr;46(4):e13122. doi: 10.1111/cogs.13122.
In order to learn the mappings from words to referents, children must integrate co-occurrence information across individually ambiguous pairs of scenes and utterances, a challenge known as cross-situational word learning. In machine learning, recent multimodal neural networks have been shown to learn meaningful visual-linguistic mappings from cross-situational data, as needed to solve problems such as image captioning and visual question answering. These networks are potentially appealing as cognitive models because they can learn from raw visual and linguistic stimuli, something previous cognitive models have not addressed. In this paper, we examine whether recent machine learning approaches can help explain various behavioral phenomena from the psychological literature on cross-situational word learning. We consider two variants of a multimodal neural network architecture and look at seven different phenomena associated with cross-situational word learning and word learning more generally. Our results show that these networks can learn word-referent mappings from a single epoch of training, mimicking the amount of training commonly found in cross-situational word learning experiments. Additionally, these networks capture some, but not all of the phenomena we studied, with all of the failures related to reasoning via mutual exclusivity. These results provide insight into the kinds of phenomena that arise naturally from relatively generic neural network learning algorithms, and which word learning phenomena require additional inductive biases.
为了学习从单词到指称物的映射,孩子们必须整合来自单独模棱两可的场景和话语对的共现信息,这是一个被称为跨情境单词学习的挑战。在机器学习中,最近的多模态神经网络已经被证明可以从跨情境数据中学习有意义的视觉语言映射,这是解决图像字幕和视觉问答等问题所必需的。这些网络作为认知模型具有吸引力,因为它们可以从原始的视觉和语言刺激中学习,这是以前的认知模型没有解决的问题。在本文中,我们研究了最近的机器学习方法是否可以帮助解释跨情境单词学习和更一般的单词学习的心理文献中的各种行为现象。我们考虑了多模态神经网络架构的两种变体,并研究了与跨情境单词学习和更一般的单词学习相关的七个不同现象。我们的结果表明,这些网络可以从单个训练时期学习单词-指称物映射,模仿跨情境单词学习实验中常见的训练量。此外,这些网络捕捉到了我们研究的一些现象,但不是全部,所有的失败都与通过互斥进行推理有关。这些结果提供了一些关于自然产生于相对通用的神经网络学习算法的现象的见解,以及哪些单词学习现象需要额外的归纳偏差。