Department for German Language and Literature, Ruhr University Bochum, Universitätsstraße 150, 44801, Bochum, Germany.
Department of Psychiatry and Psychotherapy, Charité Campus Mitte, Charité Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health at Charité - Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Digital Clinician Scientist Program, Berlin, Germany.
Behav Res Methods. 2024 Dec;56(8):8159-8180. doi: 10.3758/s13428-024-02444-x. Epub 2024 Aug 15.
We introduce a novel dataset of affective, semantic, and descriptive norms for all facial emojis at the point of data collection. We gathered and examined subjective ratings of emojis from 138 German speakers along five essential dimensions: valence, arousal, familiarity, clarity, and visual complexity. Additionally, we provide absolute frequency counts of emoji use, drawn from an extensive Twitter corpus, as well as a much smaller WhatsApp database. Our results replicate the well-established quadratic relationship between arousal and valence of lexical items, also known for words. We also report associations among the variables: for example, the subjective familiarity of an emoji is strongly correlated with its usage frequency, and positively associated with its emotional valence and clarity of meaning. We establish the meanings associated with face emojis, by asking participants for up to three descriptions for each emoji. Using this linguistic data, we computed vector embeddings for each emoji, enabling an exploration of their distribution within the semantic space. Our description-based emoji vector embeddings not only capture typical meaning components of emojis, such as their valence, but also surpass simple definitions and direct emoji2vec models in reflecting the semantic relationship between emojis and words. Our dataset stands out due to its robust reliability and validity. This new semantic norm for face emojis impacts the future design of highly controlled experiments focused on the cognitive processing of emojis, their lexical representation, and their linguistic properties.
我们介绍了一个新的情感、语义和描述性规范数据集,涵盖了所有面部表情符号。我们收集并检查了 138 位德语使用者对表情符号的主观评价,涉及五个基本维度:愉悦度、唤醒度、熟悉度、清晰度和视觉复杂度。此外,我们还提供了从大量 Twitter 语料库中提取的表情符号使用的绝对频率计数,以及一个较小的 WhatsApp 数据库。我们的结果复制了词汇项目(也适用于单词)的唤醒度和愉悦度之间的既定二次关系。我们还报告了变量之间的关联:例如,表情符号的主观熟悉度与其使用频率密切相关,并且与情感愉悦度和含义清晰度呈正相关。我们通过要求参与者为每个表情符号提供最多三个描述来确定面部表情符号的含义。使用这些语言数据,我们为每个表情符号计算了向量嵌入,从而可以探索它们在语义空间中的分布。我们基于描述的表情符号向量嵌入不仅捕捉到了表情符号的典型含义成分,例如其愉悦度,而且还超越了简单的定义和直接的 emoji2vec 模型,反映了表情符号和单词之间的语义关系。我们的数据集具有强大的可靠性和有效性。这个新的面部表情符号语义规范将影响未来专注于表情符号认知处理、词汇表示及其语言属性的高度受控实验的设计。