Department of Cognitive Science, University of California, San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA, 92093-0515, USA.
UCSD Center for Research in Language, La Jolla, CA, USA.
Behav Res Methods. 2023 Jun;55(4):1537-1557. doi: 10.3758/s13428-022-01869-6. Epub 2022 Jun 10.
For any research program examining how ambiguous words are processed in broader linguistic contexts, a first step is to establish factors relating to the frequency balance or dominance of those words' multiple meanings, as well as the similarity of those meanings to one other. Homonyms-words with divergent meanings-are one ambiguous word type commonly utilized in psycholinguistic research. In contrast, although polysemes-words with multiple related senses-are far more common in English, they have been less frequently used as tools for understanding one-to-many word-to-meaning mappings. The current paper details two norming studies of a relatively large number of ambiguous English words. In the first, offline dominance norming is detailed for 547 homonyms and polysemes via a free association task suitable for words across the ambiguity continuum, with a goal of identifying words with more equibiased meanings. The second norming assesses offline meaning similarity for a partial subset of 318 ambiguous words (including homonyms, unambiguous words, and polysemes divided into regular and irregular types) using a novel, continuous rating method reliant on the linguistic phenomenon of zeugma. In addition, we conduct computational analyses on the human similarity norming data using the BERT pretrained neural language model (Devlin et al., 2018, BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint. arXiv:1810.04805) to evaluate factors that may explain variance beyond that accounted for by dictionary-criteria ambiguity categories. Finally, we make available the summarized item dominance values and similarity ratings in resultant appendices (see supplementary material), as well as individual item and participant norming data, which can be accessed online ( https://osf.io/g7fmv/ ).
对于任何研究如何在更广泛的语言语境中处理歧义词的项目来说,第一步是确定与这些词的多种含义的频率平衡或主导地位相关的因素,以及这些含义彼此之间的相似性。同形异义词(具有不同含义的词)是心理语言学研究中常用的一种歧义词类型。相比之下,尽管多义词(具有多个相关意义的词)在英语中更为常见,但它们作为理解一词多义映射的工具使用较少。本文详细介绍了两项关于大量英语歧义词的规范研究。在第一项研究中,通过适合于整个歧义连续体的自由联想任务详细描述了 547 个同形异义词和多义词的离线优势规范,目的是确定具有更多平衡含义的词。第二项规范评估了 318 个部分歧义词(包括同形异义词、非歧义词和分为规则和不规则类型的多义词)的离线意义相似性,使用一种新颖的、依赖于轭式搭配语言现象的连续评分方法。此外,我们使用经过预训练的 BERT 神经语言模型(Devlin 等人,2018 年,BERT:用于语言理解的深度双向转换器的预训练。arXiv 预印本。arxiv:1810.04805)对人类相似性规范数据进行计算分析,以评估可能解释除词典标准歧义类别解释之外的方差的因素。最后,我们在附录中提供了总结的项目优势值和相似性评分(请参阅补充材料),以及单个项目和参与者规范数据,这些数据可在网上获取(https://osf.io/g7fmv/)。