Chinese Language and Technology Center, National Taiwan Normal University, Taipei, Taiwan.
Department of Educational Psychology and Counseling/Chinese Language and Technology Center/Institute for Research Excellence in Learning Sciences, National Taiwan Normal University, Taipei, Taiwan.
Behav Res Methods. 2019 Oct;51(5):2310-2336. doi: 10.3758/s13428-019-01208-2.
The application of word associations has become increasingly widespread. However, the association norms produced by traditional free association tests tend not to exceed 10,000 stimulus words, making the number of associated words too small to be representative of the overall language. In this study we used text corpora totaling over 400 million Chinese words, along with a multitude of association measures, to automatically construct a Chinese Lexical Association Database (CLAD) comprising the lexical association of over 80,000 words. Comparison of the CLAD with a database of traditional Chinese word association norms shows that word associations extracted from large text corpora are similar in strength to those elicited from free association tests but contain a much greater number of associative word pairs. Additionally, the relatively small numbers of participants involved in the creation of traditional norms result in relatively coarse scales of association measurement, whereas the differentiation of association strengths is greatly enhanced in the CLAD. The CLAD provides researchers with a great supplement to traditional word association norms. A query website at www.chinesereadability.net/LexicalAssociation/CLAD/ affords access to the database.
词联想的应用已经越来越广泛。然而,传统的自由联想测试产生的联想规范往往不超过 10000 个刺激词,使得联想词的数量太少,无法代表整体语言。在这项研究中,我们使用了超过 4 亿个中文单词的语料库,以及多种联想测量方法,自动构建了一个包含 80000 多个单词的中文词汇联想数据库(CLAD)。CLAD 与传统的中文词联想规范数据库的比较表明,从大型语料库中提取的词联想在强度上与从自由联想测试中得出的联想相似,但包含了更多的联想词对。此外,传统规范的创建涉及的参与者数量相对较少,导致联想测量的尺度相对较粗,而在 CLAD 中,联想强度的区分度大大增强。CLAD 为研究人员提供了对传统词联想规范的很好补充。一个查询网站 www.chinesereadability.net/LexicalAssociation/CLAD/ 提供了对该数据库的访问。