Soares Ana Paula, Machado João, Costa Ana, Iriarte Álvaro, Simões Alberto, de Almeida José João, Comesaña Montserrat, Perea Manuel
a Human Cognition Lab, CIPsi, School of Psychology , University of Minho , Minho , Portugal.
Q J Exp Psychol (Hove). 2015;68(4):680-96. doi: 10.1080/17470218.2014.964271. Epub 2014 Nov 7.
We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1920 Portuguese words (and 1920 nonwords) with different lengths in letters (M = 6.89, SD = 2.10) and syllables (M = 2.99, SD = 0.94). Multiple regression analyses on latency and accuracy data were conducted to compare the proportion of variance explained by the Portuguese subtitle word frequency measures with that accounted by the recent written-word frequency database (Procura-PALavras; P-PAL; Soares, Iriarte, et al., 2014 ). As its international counterparts, SUBTLEX-PT explains approximately 15% more of the variance in the lexical decision performance of young adults than the P-PAL database. Moreover, in line with recent studies, contextual diversity accounted for approximately 2% more of the variance in participants' reading performance than the raw frequency counts obtained from subtitles. SUBTLEX-PT is freely available for research purposes (at http://p-pal.di.uminho.pt/about/databases ).
我们研究了使用字幕的词汇数据库的潜在优势,并展示了SUBTLEX-PT,这是一个新的词汇数据库,包含从基于电影和电视剧字幕的7800万语料库中获取的132710个葡萄牙语单词,提供词频和上下文多样性度量。此外,我们通过一项词汇判断研究对SUBTLEX-PT进行了验证,该研究涉及1920个不同字母长度(M = 6.89,SD = 2.10)和音节长度(M = 2.99,SD = 0.94)的葡萄牙语单词(以及1920个非单词)。对反应时和准确性数据进行了多元回归分析,以比较葡萄牙语字幕词频度量所解释的方差比例与最近的书面词频数据库(Procura-PALavras;P-PAL;Soares,Iriarte等人,2014)所解释的方差比例。与国际上的同类数据库一样,SUBTLEX-PT比P-PAL数据库多解释了约15%的年轻人词汇判断表现中的方差。此外,与最近的研究一致,上下文多样性比从字幕中获得的原始词频计数多解释了约2%的参与者阅读表现中的方差。SUBTLEX-PT可免费用于研究目的(网址为http://p-pal.di.uminho.pt/about/databases)。