Suppr超能文献

意大利众包项目:130495个意大利语单词的视觉单词识别时间

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words.

作者信息

Amenta Simona, de Varda Andrea Gregor, Mandera Pawel, Keuleers Emmanuel, Brysbaert Marc, Marelli Marco

机构信息

Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy.

Lingvist Technologies, Tallinn, Estonia.

出版信息

Behav Res Methods. 2024 Dec 28;57(1):26. doi: 10.3758/s13428-024-02548-4.

Abstract

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part. We validated the ICP dataset by (1) showing that ICP reaction times correlate strongly (r = .78) with lexical decision latencies collected in a traditional lab experiment, (2) showing that the effect of major psycholinguistic variables (e.g., frequency, length, etc.) can be replicated in this dataset, and (3) replicating the effect of word prevalence, which we compute here for the first time for Italian. Given the inclusion of many inflectional forms of verbs, adjectives, and nouns, we further showcase the potential of this dataset by exploring two phenomena (inflectional entropy in verb paradigms and the clitic effect in isolated word recognition) that build on the peculiar properties of Italian. In this paper we present the ICP resource and release response times, accuracy, and prevalence estimates for all the words included.

摘要

尽管意大利语在很大程度上被语言和认知科学家所使用和研究,但它缺乏大量的语言处理数据资源。意大利众包项目(ICP)是一个单词识别时间和准确率的数据集,包含对130465个单词的反应,这使其成为同类项目中按项目计算最大的数据集。这些数据是在一项在线单词知识任务中收集的,超过156000名意大利语母语者参与了该任务。我们通过以下方式验证了ICP数据集:(1)表明ICP反应时间与传统实验室实验中收集的词汇判断潜伏期高度相关(r = 0.78);(2)表明主要心理语言学变量(如频率、长度等)的影响可以在该数据集中复制;(3)复制单词流行度的影响,这是我们首次为意大利语计算的。鉴于该数据集包含了动词、形容词和名词的许多屈折形式,我们通过探索基于意大利语特殊属性的两种现象(动词范式中的屈折熵和孤立单词识别中的小品词效应)进一步展示了该数据集的潜力。在本文中,我们介绍了ICP资源,并公布了所有包含单词的反应时间、准确率和流行度估计值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验