Westbury Chris, Yang Michelle, Anderson Kris
Department of Psychology, University of Alberta, P220 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada.
Department of Psychology, McGill University, 2001 McGill College Ave, Montreal, QC, H3A 1G1, Canada.
Psychon Bull Rev. 2025 Feb;32(1):203-225. doi: 10.3758/s13423-024-02551-y. Epub 2024 Aug 22.
Osgood, Suci, and Tannebaum were the first to attempt to identify the principal components of semantics using dimensional reduction of a high-dimensional model of semantics constructed from human judgments of word relatedness. Modern word-embedding models analyze patterns of words to construct higher dimensional models of semantics that can be similarly subjected to dimensional reduction. Hollis and Westbury characterized the first eight principal components (PCs) of a word-embedding model by correlating them with several well-known lexical measures, such as logged word frequency, age of acquisition, valence, arousal, dominance, and concreteness. The results show some clear differentiation of interpretation between the PCs. Here, we extend this work by analyzing a larger word-embedding matrix using semantic measures initially derived from subjective inspection of the PCs. We then use quantitative analysis to confirm the utility of these subjective measures for predicting PC values and cross-validate them on two word-embedding matrices developed on distinct corpora. Several semantic and word class measures are strongly predictive of early PC values, including first-person and second-person verbs, personal relevance of abstract and concrete words, affect terms, and names of places and people. The predictors of the lowest magnitude PCs generalized well to word-embedding matrices constructed from separate corpora, including matrices constructed using different word-embedding methods. The predictive categories we describe are consistent with Wittgenstein's argument that an autonomous level of social interaction grounds linguistic meaning.
奥斯古德、苏西和坦纳鲍姆率先尝试通过对基于人类词语关联性判断构建的高维语义模型进行降维,来识别语义的主要成分。现代词嵌入模型分析词语模式,以构建可同样进行降维的更高维语义模型。霍利斯和韦斯特伯里通过将词嵌入模型的前八个主成分(PCs)与一些知名的词汇度量(如对数词频、习得年龄、效价、唤醒度、支配性和具体性)进行关联,对这些主成分进行了特征描述。结果显示,各主成分之间在解释上存在一些明显差异。在此,我们通过使用最初从对主成分的主观检查中得出的语义度量来分析一个更大的词嵌入矩阵,扩展了这项工作。然后,我们使用定量分析来确认这些主观度量在预测主成分值方面的效用,并在基于不同语料库开发的两个词嵌入矩阵上对它们进行交叉验证。一些语义和词类度量对早期主成分值具有很强的预测性,包括第一人称和第二人称动词、抽象词和具体词的个人相关性、情感术语以及地点和人物的名称。最低量级主成分的预测指标能够很好地推广到由单独语料库构建的词嵌入矩阵,包括使用不同词嵌入方法构建的矩阵。我们所描述的预测类别与维特根斯坦的观点一致,即社会互动的自主层面是语言意义的基础。