简体中文字典项目：一个包含8105个汉字和4864个假字的词汇判断数据库。

Simplified Chinese lexicon project: A lexical decision database with 8105 characters and 4864 pseudocharacters.

作者信息

Wang Yixia, Wang Yanxue, Chen Qi, Keuleers Emmanuel

机构信息

Tilburg University, Warandelaan 2, Tilburg, 5037, AB, Netherlands.

South China Normal University, Guangzhou, China.

出版信息

Behav Res Methods. 2025 Jun 23;57(7):206. doi: 10.3758/s13428-025-02701-7.

DOI:10.3758/s13428-025-02701-7

PMID:40549266

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12185670/

Abstract

This paper presents the Simplified Chinese Lexicon Project (SCLP), which collects lexical decision data for all 8105 characters in the List of Commonly Used Standard Chinese Characters and for 4864 pseudocharacters, which were generated using a novel method that leveraged the hierarchical nature of Chinese characters. We compared the collected data to existing megastudies on Chinese characters, and found that the newly collected data performed similarly in terms of reliability. The comprehensive coverage of simplified Chinese characters in the present study added to the existing studies by allowing for a more fine-grained investigation of the effects of a variety of character attributes on visual processing. We illustrated these advantages by performing virtual experiments on visual complexity and on the interplay between neighborhood size and regularity. Our results indicated that characters with higher visual complexity were harder to recognize, in line with previous findings, while regular characters took longer to process when the neighborhood size was small. In addition, we present a new evaluation of the interaction between character frequency and subcomponent frequency, resulting in a three-way interaction among character frequency, radical frequency, and residual component frequency. Extending the investigation of subcomponent frequency to the analysis of pseudocharacters, we found that the interaction of radical frequency and residual component frequency also modulated pseudocharacter rejection. To support researchers in conducting behavioral experiments or statistical modeling, we provide both trial-level data and experiment materials.

摘要

本文介绍了简体中文字典项目（SCLP），该项目收集了《通用规范汉字表》中所有8105个汉字以及4864个伪汉字的词汇判断数据，这些伪汉字是使用一种利用汉字层级结构的新方法生成的。我们将收集到的数据与现有的关于汉字的大型研究进行了比较，发现新收集的数据在可靠性方面表现相似。本研究对简体汉字的全面覆盖通过允许对各种汉字属性对视觉处理的影响进行更细致的研究，为现有研究增添了内容。我们通过对视觉复杂性以及邻域大小与规则性之间的相互作用进行虚拟实验，展示了这些优势。我们的结果表明，视觉复杂性较高的汉字更难识别，这与先前的研究结果一致，而当邻域大小较小时，规则汉字的处理时间更长。此外，我们对汉字频率与子部件频率之间的相互作用进行了新的评估，结果得出了汉字频率、部首频率和剩余部件频率之间的三向相互作用。将子部件频率的研究扩展到伪汉字的分析中，我们发现部首频率和剩余部件频率的相互作用也调节了伪汉字的拒绝。为了支持研究人员进行行为实验或统计建模，我们提供了试验级数据和实验材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3b3/12185670/716a078782cf/13428_2025_2701_Fig1_HTML.jpg

相似文献

Simplified Chinese lexicon project: A lexical decision database with 8105 characters and 4864 pseudocharacters.简体中文字典项目：一个包含8105个汉字和4864个假字的词汇判断数据库。

Behav Res Methods. 2025 Jun 23;57(7):206. doi: 10.3758/s13428-025-02701-7.

Investigating the effects of semantic radical consistency in chinese character naming with a corpus-based measure.基于语料库的方法探究语义部首一致性对汉字命名的影响。

J Exp Psychol Learn Mem Cogn. 2025 Aug;51(8):1347-1362. doi: 10.1037/xlm0001425. Epub 2025 Feb 17.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Reading aids for adults with low vision.针对视力低下成年人的阅读辅助工具。

Cochrane Database Syst Rev. 2018 Apr 17;4(4):CD003303. doi: 10.1002/14651858.CD003303.pub4.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状荟萃分析。

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Gender differences in the context of interventions for improving health literacy in migrants: a qualitative evidence synthesis.移民健康素养提升干预措施背景下的性别差异：一项定性证据综合分析

Cochrane Database Syst Rev. 2024 Dec 12;12(12):CD013302. doi: 10.1002/14651858.CD013302.pub2.

Impact of residual disease as a prognostic factor for survival in women with advanced epithelial ovarian cancer after primary surgery.原发性手术后晚期上皮性卵巢癌患者残留病灶对生存预后的影响。

Cochrane Database Syst Rev. 2022 Sep 26;9(9):CD015048. doi: 10.1002/14651858.CD015048.pub2.

本文引用的文献

HeLP: The Hebrew Lexicon project.希伯来语词典项目（HeLP）。

Behav Res Methods. 2024 Dec;56(8):8761-8783. doi: 10.3758/s13428-024-02502-4. Epub 2024 Sep 9.

Assessing lexical ambiguity of simplified Chinese characters: Plurality and relatedness of character meanings.评估简体汉字的词汇歧义：字意的多样性和相关性。

Q J Exp Psychol (Hove). 2024 Apr;77(4):677-693. doi: 10.1177/17470218231178787. Epub 2023 Jun 22.

The Chinese Lexicon Project II: A megastudy of speeded naming performance for 25,000+ traditional Chinese two-character words.《汉语词汇项目 II：25000 多个繁体中文字的快速命名表现的巨量研究》。

Behav Res Methods. 2023 Dec;55(8):4382-4402. doi: 10.3758/s13428-022-02022-z. Epub 2022 Nov 28.

An ERP megastudy of Chinese word recognition.一项关于中文词汇识别的 ERP 巨量研究。

Psychophysiology. 2022 Nov;59(11):e14111. doi: 10.1111/psyp.14111. Epub 2022 May 24.

The Persian Lexicon Project: minimized orthographic neighbourhood effects in a dense language.波斯语词典项目：在一种密集语言中最小化正字法邻域效应。

J Psycholinguist Res. 2022 Oct;51(5):957-979. doi: 10.1007/s10936-022-09863-x. Epub 2022 Apr 2.

SpaVerb-WN-A megastudy of naming times for 4562 Spanish verbs: Effects of psycholinguistic and motor content variables.SpaVerb-WN-4562 个西班牙语动词命名时间的巨量研究：心理语言学和运动内容变量的影响。

Behav Res Methods. 2022 Dec;54(6):2640-2664. doi: 10.3758/s13428-021-01734-y. Epub 2021 Dec 16.

The Representations of Chinese Characters: Evidence from Sublexical Components.汉字的表象：来自次词汇成分的证据。

J Neurosci. 2022 Jan 5;42(1):135-144. doi: 10.1523/JNEUROSCI.1057-21.2021. Epub 2021 Nov 15.

Effects of Phonological Consistency and Semantic Radical Combinability on N170 and P200 in the Reading of Chinese Phonograms.语音一致性和语义部首组合性对汉语形声字阅读中N170和P200的影响

Front Psychol. 2021 Jul 9;12:603878. doi: 10.3389/fpsyg.2021.603878. eCollection 2021.

The generalizability crisis.普遍性危机。

Behav Brain Sci. 2020 Dec 21;45:e1. doi: 10.1017/S0140525X20001685.

A word or two about nonwords: Frequency, semantic neighborhood density, and orthography-to-semantics consistency effects for nonwords in the lexical decision task.简单谈一谈“非词”：在词汇判断任务中，非词的频率、语义近邻密度和正字法到语义一致性效应。

J Exp Psychol Learn Mem Cogn. 2021 Jan;47(1):157-183. doi: 10.1037/xlm0000819. Epub 2020 Jan 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

简体中文字典项目：一个包含8105个汉字和4864个假字的词汇判断数据库。

Simplified Chinese lexicon project: A lexical decision database with 8105 characters and 4864 pseudocharacters.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献