Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, Polish Academy of Sciences, 3 Pasteur Street, 02-093, Warsaw, Poland.
Department of Psychology, Columbia University, New York, NY, USA.
Behav Res Methods. 2022 Oct;54(5):2146-2161. doi: 10.3758/s13428-021-01697-0. Epub 2021 Dec 10.
Emotion lexicons are useful in research across various disciplines, but the availability of such resources remains limited for most languages. While existing emotion lexicons typically comprise words, it is a particular meaning of a word (rather than the word itself) that conveys emotion. To mitigate this issue, we present the Emotion Meanings dataset, a novel dataset of 6000 Polish word meanings. The word meanings are derived from the Polish wordnet (plWordNet), a large semantic network interlinking words by means of lexical and conceptual relations. The word meanings were manually rated for valence and arousal, along with a variety of basic emotion categories (anger, disgust, fear, sadness, anticipation, happiness, surprise, and trust). The annotations were found to be highly reliable, as demonstrated by the similarity between data collected in two independent samples: unsupervised (n = 21,317) and supervised (n = 561). Although we found the annotations to be relatively stable for female, male, younger, and older participants, we share both summary data and individual data to enable emotion research on different demographically specific subgroups. The word meanings are further accompanied by the relevant metadata, derived from open-source linguistic resources. Direct mapping to Princeton WordNet makes the dataset suitable for research on multiple languages. Altogether, this dataset provides a versatile resource that can be employed for emotion research in psychology, cognitive science, psycholinguistics, computational linguistics, and natural language processing.
情绪词汇在跨学科研究中非常有用,但大多数语言的此类资源仍然有限。虽然现有的情绪词汇通常由单词组成,但传达情绪的是单词的特定含义(而不是单词本身)。为了解决这个问题,我们提出了“Emotion Meanings 数据集”,这是一个包含 6000 个波兰词义的新型数据集。这些词义是从波兰词网(plWordNet)中提取的,这是一个通过词汇和概念关系将单词相互关联的大型语义网络。词义经过了情感评价,包括效价和唤醒度,以及各种基本情绪类别(愤怒、厌恶、恐惧、悲伤、期待、幸福、惊讶和信任)。标注结果高度可靠,两个独立样本(无监督样本,n=21317;有监督样本,n=561)的一致性就证明了这一点。虽然我们发现女性、男性、年轻组和老年组的标注结果相对稳定,但我们还是共享了汇总数据和个人数据,以便在不同的人口统计学特定子组中开展情绪研究。每个词义还附有从开源语言资源中提取的相关元数据。与 Princeton WordNet 的直接映射使该数据集适用于多种语言的研究。总的来说,该数据集提供了一个功能多样的资源,可用于心理学、认知科学、心理语言学、计算语言学和自然语言处理等领域的情绪研究。