Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY, USA.
University of California, Berkeley, CA, USA.
Q J Exp Psychol (Hove). 2020 Jun;73(6):841-855. doi: 10.1177/1747021819897560. Epub 2020 Feb 14.
Recently, a new crowd-sourced language metric has been introduced, entitled word prevalence, which estimates the proportion of the population that knows a given word. This measure has been shown to account for unique variance in large sets of lexical performance. This article aims to build on the work of Brysbaert et al. and Keuleers et al. by introducing new corpus-based metrics that estimate how likely a word is to be an active member of the natural language environment, and hence known by a larger subset of the general population. This metric is derived from an analysis of a newly collected corpus of over 25,000 fiction and non-fiction books and will be shown that it is capable of accounting for significantly more variance than past corpus-based measures.
最近,一种新的众包语言指标被引入,称为单词流行度,它估计了知道某个单词的人群比例。这一指标已被证明可以解释词汇表现的大型数据集的独特差异。本文旨在通过引入新的基于语料库的指标来扩展 Brysbaert 等人和 Keuleers 等人的工作,这些指标估计了一个单词成为自然语言环境中活跃成员的可能性,从而被更大比例的一般大众所知晓。这一指标是从对一个新收集的超过 25000 本小说和非小说类书籍的语料库进行分析得出的,结果表明,它能够解释更多的方差,而不仅仅是基于过去的语料库的测量方法。