Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA.
Learning Research and Development Center, University of Pittsburgh, Pittsburgh, PA, USA.
Behav Res Methods. 2019 Jun;51(3):1399-1425. doi: 10.3758/s13428-018-1107-7.
Most words are ambiguous, with interpretation dependent on context. Advancing theories of ambiguity resolution is important for any general theory of language processing, and for resolving inconsistencies in observed ambiguity effects across experimental tasks. Focusing on homonyms (words such as bank with unrelated meanings EDGE OF A RIVER vs. FINANCIAL INSTITUTION), the present work advances theories and methods for estimating the relative frequency of their meanings, a factor that shapes observed ambiguity effects. We develop a new method for estimating meaning frequency based on the meaning of a homonym evoked in lines of movie and television subtitles according to human raters. We also replicate and extend a measure of meaning frequency derived from the classification of free associates. We evaluate the internal consistency of these measures, compare them to published estimates based on explicit ratings of each meaning's frequency, and compare each set of norms in predicting performance in lexical and semantic decision mega-studies. All measures have high internal consistency and show agreement, but each is also associated with unique variance, which may be explained by integrating cognitive theories of memory with the demands of different experimental methodologies. To derive frequency estimates, we collected manual classifications of 533 homonyms over 50,000 lines of subtitles, and of 357 homonyms across over 5000 homonym-associate pairs. This database-publicly available at: www.blairarmstrong.net/homonymnorms/ -constitutes a novel resource for computational cognitive modeling and computational linguistics, and we offer suggestions around good practices for its use in training and testing models on labeled data.
大多数单词都是多义的,其含义取决于上下文。推进歧义消解理论对于任何语言处理的一般理论以及解决不同实验任务中观察到的歧义效应的不一致性都很重要。本文聚焦同形异义词(如 bank,有河岸和金融机构等不同的含义),提出了一种用于估计其词义相对频率的理论和方法,词义频率是影响观察到的歧义效应的一个因素。我们开发了一种基于电影和电视剧字幕中同形异义词含义的新方法来估计词义频率。我们还复制并扩展了一种从自由联想分类中得出的词义频率度量方法。我们评估了这些度量的内部一致性,将它们与基于每个词义频率的显式评分得出的已发表估计进行了比较,并将每种规范在预测词汇和语义决策大型研究中的表现进行了比较。所有度量都具有较高的内部一致性和一致性,但每个度量都与独特的方差相关,这可以通过将记忆的认知理论与不同实验方法的要求相结合来解释。为了得出频率估计,我们收集了 533 个同形异义词在 50000 多行字幕中的手动分类数据,以及 357 个同形异义词在 5000 多个同形异义词联想对中的分类数据。该数据库可在以下网址公开获取:www.blairarmstrong.net/homonymnorms/-这为计算认知建模和计算语言学提供了一个新的资源,我们围绕在标记数据上训练和测试模型的良好实践提供了一些建议。