Department of Computing Science, University of Alberta, 3-57 Athabasca Hall, Edmonton, AB, T6G 2E8, Canada.
Department of Psychology, University of Alberta, P217 Biological Sciences Building T6G 2E9, Edmonton, AB, Canada.
Behav Res Methods. 2018 Feb;50(1):115-133. doi: 10.3758/s13428-017-1009-0.
Large-scale semantic norms have become both prevalent and influential in recent psycholinguistic research. However, little attention has been directed towards understanding the methodological best practices of such norm collection efforts. We compared the quality of semantic norms obtained through rating scales, numeric estimation, and a less commonly used judgment format called best-worst scaling. We found that best-worst scaling usually produces norms with higher predictive validities than other response formats, and does so requiring less data to be collected overall. We also found evidence that the various response formats may be producing qualitatively, rather than just quantitatively, different data. This raises the issue of potential response format bias, which has not been addressed by previous efforts to collect semantic norms, likely because of previous reliance on a single type of response format for a single type of semantic judgment. We have made available software for creating best-worst stimuli and scoring best-worst data. We also made available new norms for age of acquisition, valence, arousal, and concreteness collected using best-worst scaling. These norms include entries for 1,040 words, of which 1,034 are also contained in the ANEW norms (Bradley & Lang, Affective norms for English words (ANEW): Instruction manual and affective ratings (pp. 1-45). Technical report C-1, the center for research in psychophysiology, University of Florida, 1999).
大规模语义规范在最近的心理语言学研究中变得非常流行和有影响力。然而,人们很少关注理解这种规范收集工作的最佳方法。我们比较了通过评分量表、数值估计和一种不太常用的称为最佳-最差标度的判断格式获得的语义规范的质量。我们发现,最佳-最差标度通常会产生具有更高预测有效性的规范,并且在总体上需要收集的数据更少。我们还发现证据表明,各种响应格式可能会生成定性的而不仅仅是定量的数据。这就提出了潜在响应格式偏差的问题,这在以前收集语义规范的努力中没有得到解决,可能是因为以前依赖单一类型的响应格式来进行单一类型的语义判断。我们提供了用于创建最佳-最差刺激和评分最佳-最差数据的软件。我们还提供了使用最佳-最差标度收集的获得年龄、情感、唤醒和具体性的新规范。这些规范包括 1040 个单词的条目,其中 1034 个也包含在 ANEW 规范中(Bradley & Lang,英语单词的情感规范(ANEW):使用说明书和情感评级(第 1-45 页)。佛罗里达大学心理生理学研究中心技术报告 C-1,1999 年)。