Moret-Tatay Carmen, Gamermann Daniel, Murphy Michael, Kuzmičová Anezka
a Universidad Católica de Valencia, San Vicente Mártir.
b Universidade Federal do Rio Grande do Sul.
J Gen Psychol. 2018 Apr-Jun;145(2):170-182. doi: 10.1080/00221309.2018.1459451. Epub 2018 May 14.
Word frequency is one of the most robust factors in the literature on word processing, based on the lexical corpus of a language. However, different sources might be used in order to determine the actual frequency of each word. Recent research has determined frequencies based on movie subtitles, Twitter, blog posts, or newspapers. In this paper, we examine a determination of these frequencies based on the World Wide Web. For this purpose, a Python script was developed to obtain frequencies of a word through online search results. These frequencies were employed to estimate lexical decision times in comparison to the traditional frequencies in a lexical decision task. It was found that the Google frequencies predict reaction times comparably to the traditional frequencies. Still, the explained variance was higher for the traditional database.
基于一种语言的词汇语料库,词频是词汇加工文献中最稳定的因素之一。然而,为了确定每个单词的实际频率,可能会使用不同的来源。最近的研究基于电影字幕、推特、博客文章或报纸来确定频率。在本文中,我们研究基于万维网来确定这些频率。为此,开发了一个Python脚本,通过在线搜索结果来获取一个单词的频率。在词汇判断任务中,将这些频率与传统频率相比较,用于估计词汇判断时间。研究发现,谷歌频率预测的反应时间与传统频率相当。尽管如此,传统数据库的解释方差更高。