School of Psychology, University of Nottingham, Nottingham, UK.
School of Psychology, Wrexham Glyndŵr University, Wrexham, UK.
Q J Exp Psychol (Hove). 2024 May;77(5):1052-1067. doi: 10.1177/17470218231190315. Epub 2023 Aug 30.
We present SUBTLEX-CY, a new word frequency database created from a 32-million-word corpus of Welsh television subtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against words with inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the (CEG), and three other Welsh word frequency databases. Words were selected that were classified as low frequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as medium frequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were responded to more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welsh word frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, and other lexical information is freely available for research purposes on the Open Science Framework repository at https://osf.io/9gkqm/.
我们呈现了 SUBTLEX-CY,这是一个基于 3200 万词威尔士电视字幕语料库创建的新的单词频率数据库。一项包含词汇判断任务的实验,将 SUBTLEX-CY 的频率估计与在一个经常被研究人员使用的较小的威尔士语料库(CEG)中频率不一致的单词进行了比较,该语料库还包括三个其他的威尔士单词频率数据库。我们选择了在 SUBTLEX-CY 中被归类为低频 (LF) 而在 CEG 中被归类为高频 (HF) 的单词,并将其与在 SUBTLEX-CY 和 CEG 中都被归类为中频 (MF) 的单词进行了比较。反应时间分析表明,CEG 中的 HF 单词的反应速度比 MF 单词慢,这表明 SUBTLEX-CY 语料库提供了更可靠的威尔士单词频率估计。这个新的威尔士单词频率数据库还包括词性、语境多样性和其他词汇信息,可在开放科学框架存储库(https://osf.io/9gkqm/)上免费用于研究目的。