School of Psychology, University of Minho, Minho, Portugal,
Behav Res Methods. 2014 Mar;46(1):240-53. doi: 10.3758/s13428-013-0350-1.
In this article, we introduce ESCOLEX, the first European Portuguese children's lexical database with grade-level-adjusted word frequency statistics. Computed from a 3.2-million-word corpus, ESCOLEX provides 48,381 word forms extracted from 171 elementary and middle school textbooks for 6- to 11-year-old children attending the first six grades in the Portuguese educational system. Like other children's grade-level databases (e.g., Carroll, Davies, & Richman, 1971; Corral, Ferrero, & Goikoetxea, Behavior Research Methods, 41, 1009-1017, 2009; Lété, Sprenger-Charolles, & Colé, Behavior Research Methods, Instruments, & Computers, 36, 156-166, 2004; Zeno, Ivens, Millard, Duvvuri, 1995), ESCOLEX provides four frequency indices for each grade: overall word frequency (F), index of dispersion across the selected textbooks (D), estimated frequency per million words (U), and standard frequency index (SFI). It also provides a new measure, contextual diversity (CD). In addition, the number of letters in the word and its part(s) of speech, number of syllables, syllable structure, and adult frequencies taken from P-PAL (a European Portuguese corpus-based lexical database; Soares, Comesaña, Iriarte, Almeida, Simões, Costa, …, Machado, 2010; Soares, Iriarte, Almeida, Simões, Costa, França, …, Comesaña, in press) are provided. ESCOLEX will be a useful tool both for researchers interested in language processing and development and for professionals in need of verbal materials adjusted to children's developmental stages. ESCOLEX can be downloaded along with this article or from http://p-pal.di.uminho.pt/about/databases .
本文介绍了 ESCOLEX,这是第一个具有年级调整词频统计功能的欧洲葡萄牙儿童词汇数据库。该数据库基于 320 万词的语料库计算得出,包含从葡萄牙教育系统的 171 本小学和中学教科书中提取的 48,381 个单词形式,适用于 6 至 11 岁的一至六年级儿童。像其他儿童年级数据库(例如,Carroll、Davies 和 Richman,1971;Corral、Ferrero 和 Goikoetxea,《行为研究方法》,41,1009-1017,2009;Lété、Sprenger-Charolles 和 Colé,《行为研究方法、仪器和计算机》,36,156-166,2004;Zeno、Ivens、Millard 和 Duvvuri,1995)一样,ESCOLEX 为每个年级提供四个频率指标:总词频(F)、所选教材分布指数(D)、每百万词估计频率(U)和标准频率指数(SFI)。它还提供了一个新的指标,语境多样性(CD)。此外,还提供了单词的字母数及其词性、音节数、音节结构以及从 P-PAL(一个基于欧洲葡萄牙语语料库的词汇数据库;Soares、Comesaña、Iriarte、Almeida、Simões、Costa、…、Machado,2010;Soares、Iriarte、Almeida、Simões、Costa、Franca、…、Comesaña,in press)中获取的成人频率。ESCOLEX 将成为对语言处理和发展感兴趣的研究人员以及需要适应儿童发展阶段的语言材料的专业人员的有用工具。ESCOLEX 可以与本文一起下载,也可以从 http://p-pal.di.uminho.pt/about/databases 下载。