Suppr超能文献

一个来自“词汇小世界”项目的大规模汉语词语联想数据库。

A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.

作者信息

Li Bing, Ding Ziyi, De Deyne Simon, Cai Qing

机构信息

Key Laboratory of Brain Functional Genomics (MOE & STCSM), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China.

Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, 59000, Lille, France.

出版信息

Behav Res Methods. 2024 Dec 30;57(1):34. doi: 10.3758/s13428-024-02513-1.

Abstract

Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.

摘要

词语联想是衡量人类头脑中词义的最直接方式之一,它能捕捉各种关系,甚至包括那些由非语言经验形成的关系。尽管荷兰语、英语和西班牙语都有大规模的词语联想数据,但来自一个独特语系的最广泛使用的语言——汉语普通话却缺乏相关数据。在此,我们展示了《词语小世界-中文》(SWOW-ZH),这是一个源自三反应词语联想任务的汉语普通话词语联想数据集。该数据集涵盖了来自4万多名参与者对1万多个线索词的反应。我们基于此数据集构建了一个语义网络,并通过预测人类处理潜伏期并将其与基于文本的度量和词嵌入进行比较,评估了基于联想的度量的同时效度。我们的结果表明,词中心性显著预测词汇判断和单词命名速度。此外,在预测不同样本量下人类评级的词语关系时,SWOW-ZH明显优于基于文本的嵌入和基于Transformer的大语言模型。我们还强调了汉语词语联想的独特特征,尤其关注构词法。综合来看,我们的研究结果强调了大规模人类实验数据的至关重要性及其对理解语言复杂性和丰富性的独特贡献。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验