Brysbaert Marc, Buchmeier Matthias, Conrad Markus, Jacobs Arthur M, Bölte Jens, Böhl Andrea
Deaprtment of Experimental Psychology, Ghent University, Belgium.
Exp Psychol. 2011;58(5):412-24. doi: 10.1027/1618-3169/a000123.
We review recent evidence indicating that researchers in experimental psychology may have used suboptimal estimates of word frequency. Word frequency measures should be based on a corpus of at least 20 million words that contains language participants in psychology experiments are likely to have been exposed to. In addition, the quality of word frequency measures should be ascertained by correlating them with behavioral word processing data. When we apply these criteria to the word frequency measures available for the German language, we find that the commonly used Celex frequencies are the least powerful to predict lexical decision times. Better results are obtained with the Leipzig frequencies, the dlexDB frequencies, and the Google Books 2000-2009 frequencies. However, as in other languages the best performance is observed with subtitle-based word frequencies. The SUBTLEX-DE word frequencies collected for the present ms are made available in easy-to-use files and are free for educational purposes.
我们回顾了近期的证据,这些证据表明实验心理学领域的研究人员可能使用了次优的词频估计。词频测量应基于一个至少包含2000万个单词的语料库,该语料库应包含心理学实验参与者可能接触过的语言。此外,词频测量的质量应通过将其与行为词处理数据进行关联来确定。当我们将这些标准应用于德语可用的词频测量时,我们发现常用的Celex频率在预测词汇判断时间方面的效力最低。使用莱比锡频率、dlexDB频率和谷歌图书2000 - 2009频率能获得更好的结果。然而,与其他语言一样,基于字幕的词频表现最佳。为本手稿收集的SUBTLEX - DE词频以易于使用的文件形式提供,可免费用于教育目的。