量化视觉、语言以及视觉-语言复杂性在名词和动词习得中的作用。

Quantifying the roles of visual, linguistic, and visual-linguistic complexity in noun and verb acquisition.

作者信息

Zhou Yuchen, Tarr Michael J, Yurovsky Daniel

机构信息

Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States.

Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States.

出版信息

PLoS One. 2025 May 23;20(5):e0321973. doi: 10.1371/journal.pone.0321973. eCollection 2025.

DOI:10.1371/journal.pone.0321973

PMID:40408616

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12101840/

Abstract

Children often learn the meanings of nouns before they grasp the meanings of verbs. This discrepancy could arise from differences in the complexity of visual characteristics for categories that language describes, the inherent structure of language, or how these two sources of information align. To explore this question, we analyze visual and linguistic representations derived from large-scale pre-trained artificial neural networks of common nouns and verbs, focusing on these three hypotheses about early verb learning. Our findings reveal that verb representations are more variable and less distinct within their domain compared to nouns. When only one example per category is available, the alignment between visual and linguistic representations is weaker for verbs than for nouns. However, with multiple examples (mirroring human language development), this alignment improves significantly for verbs, approaching that of nouns. This suggests that the difficulty in learning verbs is not primarily due to mapping visual events to verb meanings, but rather in forming accurate representations of each verb category. Regression analysis indicates that visual variability significantly impacts verb learning, followed by the alignment of visual and linguistic elements and linguistic variability. Our study provides a quantitative and integrative framework to account for the challenges children face in early word learning, opening new avenues for resolving the longstanding debate on why verbs are harder to learn than nouns.

摘要

儿童通常在掌握动词含义之前就学会了名词的含义。这种差异可能源于语言所描述的类别在视觉特征复杂性上的差异、语言的内在结构，或者这两种信息来源的匹配方式。为了探究这个问题，我们分析了从大规模预训练的普通名词和动词人工神经网络中获得的视觉和语言表征，重点关注关于早期动词学习的这三个假设。我们的研究结果表明，与名词相比，动词表征在其领域内更具变异性且区分度更低。当每个类别只有一个例子时，动词的视觉和语言表征之间的匹配比名词更弱。然而，有多个例子时（反映人类语言发展情况），动词的这种匹配度会显著提高，接近名词的水平。这表明学习动词的困难并非主要源于将视觉事件映射到动词含义，而是在于形成每个动词类别的准确表征。回归分析表明，视觉变异性对动词学习有显著影响，其次是视觉和语言元素的匹配以及语言变异性。我们的研究提供了一个定量的综合框架，以解释儿童在早期词汇学习中面临的挑战，为解决关于为什么动词比名词更难学的长期争论开辟了新途径。