Suppr超能文献

Flickr 频率规范:17 年来在线标记的图像告诉了我们什么关于词汇处理的信息。

The Flickr frequency norms: What 17 years of images tagged online tell us about lexical processing.

机构信息

University of Milano-Bicocca, Milan, Italy.

University of Tübingen, Tübingen, Germany.

出版信息

Behav Res Methods. 2024 Jan;56(1):126-147. doi: 10.3758/s13428-022-02031-y. Epub 2022 Dec 12.

Abstract

Word frequency is one of the best predictors of language processing. Typically, word frequency norms are entirely based on natural-language text data, thus representing what the literature typically refers to as purely linguistic experience. This study presents Flickr frequency norms as a novel word frequency measure from a domain-specific corpus inherently tied to extra-linguistic information: words used as image tags on social media. To obtain Flickr frequency measures, we exploited the photo-sharing platform Flickr Image (containing billions of photos) and extracted the number of uploaded images tagged with each of the words considered in the lexicon. Here, we systematically examine the peculiarities of Flickr frequency norms and show that Flickr frequency is a hybrid metrics, lying at the intersection between language and visual experience and with specific biases induced by being based on image-focused social media. Moreover, regression analyses indicate that Flickr frequency captures additional information beyond what is already encoded in existing norms of linguistic, sensorimotor, and affective experience. Therefore, these new norms capture aspects of language usage that are missing from traditional frequency measures: a portion of language usage capturing the interplay between language and vision, which - this study demonstrates - has its own impact on word processing. The Flickr frequency norms are openly available on the Open Science Framework (https://osf.io/2zfs3/).

摘要

词频是语言处理的最佳预测指标之一。通常,词频规范完全基于自然语言文本数据,因此代表了文献中通常所说的纯粹语言经验。本研究提出了 Flickr 频率规范,这是一种来自特定于领域语料库的新的词汇频率度量方法,该语料库与语言之外的信息固有相关:社交媒体上用作图像标签的单词。为了获得 Flickr 频率度量,我们利用了照片分享平台 Flickr Image(包含数十亿张照片),并提取了词汇中每个单词的上传图片数量。在这里,我们系统地研究了 Flickr 频率规范的特点,并表明 Flickr 频率是一种混合度量标准,介于语言和视觉经验之间,并受到基于以图像为中心的社交媒体的特定偏差的影响。此外,回归分析表明,Flickr 频率捕捉到了现有语言、感觉运动和情感经验规范中已经编码的信息之外的额外信息。因此,这些新的规范捕捉到了传统频率测量方法中缺失的语言使用方面:一部分语言使用捕捉语言和视觉之间的相互作用,而本研究表明,这对单词处理有其自身的影响。Flickr 频率规范可在开放科学框架(https://osf.io/2zfs3/)上公开获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验