Shafaei-Bajestan Elnaz, Moradipour-Tari Masoumeh, Uhrig Peter, Baayen R Harald
Department of General and Computational Linguistics, University of Tübingen, Wilhelmstraße 19, Tübingen, 72074 Baden-Württemberg Germany.
Department of English and American Studies, Friedrich-Alexander-Universität Erlangen-Nürnberg, Bismarckstraße 1, Erlangen, 91054 Bayern Germany.
Morphology (Dordr). 2024;34(4):369-413. doi: 10.1007/s11525-024-09428-9. Epub 2024 Jul 12.
Using distributional semantics, we show that English nominal pluralization exhibits semantic clusters. For instance, the change in semantic space from singulars to plurals differs depending on whether a word denotes, e.g., a fruit, or an animal. Languages with extensive noun classes such as Swahili and Kiowa distinguish between these kind of words in their morphology. In English, even though not marked morphologically, plural semantics actually also varies by semantic class. A semantically informed method, CosClassAvg, is introduced that is compared to two other methods, one implementing a fixed shift from singular to plural, and one creating plural vectors from singular vectors using a linear mapping (FRACSS). Compared to FRACSS, CosClassAvg predicted plural vectors that were more similar to the corpus-extracted plural vectors in terms of vector length, but somewhat less similar in terms of orientation. Both FRACSS and CosClassAvg outperform the method using a fixed shift vector to create plural vectors, which does not do justice to the intricacies of English plural semantics. A computational modeling study revealed that the observed difference between the plural semantics generated by these three methods carries over to how well a computational model of the listener can understand previously unencountered plural forms. Among all methods, CosClassAvg provides a good balance for the trade-off between productivity (being able to understand novel plural forms) and faithfulness to corpus-extracted plural vectors (i.e., understanding the particulars of the meaning of a given plural form).
The online version contains supplementary material available at 10.1007/s11525-024-09428-9.
使用分布语义学,我们表明英语名词复数化呈现出语义簇。例如,从单数到复数的语义空间变化因单词所表示的是水果还是动物等而有所不同。像斯瓦希里语和基奥瓦语这样具有广泛名词类别的语言在其形态学中区分这类单词。在英语中,尽管在形态上没有标记,但复数语义实际上也因语义类别而异。我们引入了一种基于语义的方法CosClassAvg,并将其与另外两种方法进行比较,一种方法是从单数到复数实现固定偏移,另一种方法是使用线性映射(FRACSS)从单数向量创建复数向量。与FRACSS相比,CosClassAvg预测的复数向量在向量长度方面与语料库提取的复数向量更相似,但在方向方面相似度稍低。FRACSS和CosClassAvg都优于使用固定偏移向量来创建复数向量的方法,该方法无法公正地处理英语复数语义的复杂性。一项计算建模研究表明,这三种方法生成的复数语义之间观察到的差异会影响听者的计算模型对以前未遇到的复数形式的理解程度。在所有方法中,CosClassAvg在生产率(能够理解新的复数形式)和对语料库提取的复数向量的忠实度(即理解给定复数形式的具体含义)之间的权衡上提供了良好的平衡。
在线版本包含可在10.1007/s11525 - 0x4 - 09428 - 9获取的补充材料。 (注:原文中“10.1007/s11525 - 0x4 - 09428 - 9”疑似有误,推测应为“10.1007/s11525 - 024 - 09428 - 9”,译文按推测正确内容翻译)