Suppr超能文献

LEIA:用于情感识别的语言嵌入

LEIA: Linguistic Embeddings for the Identification of Affect.

作者信息

Aroyehun Segun Taofeek, Malik Lukas, Metzler Hannah, Haimerl Nikolas, Di Natale Anna, Garcia David

机构信息

Department of Politics and Public Administration, University of Konstanz, Konstanz, Germany.

Graz University of Technology, Graz, Austria.

出版信息

EPJ Data Sci. 2023;12(1):52. doi: 10.1140/epjds/s13688-023-00427-0. Epub 2023 Nov 16.

Abstract

The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA's robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.

摘要

社交媒体生成的大量文本数据使得利用语言模型对情感进行新型分析成为可能。这些模型通常是在由读者生成的小型且昂贵的文本注释数据集上进行训练的,读者会猜测社交媒体帖子中他人表达的情感。由于训练数据规模的限制以及模型开发中使用的标签生成过程中的噪声,这影响了情感识别方法的质量。我们提出了LEIA,这是一种用于文本情感识别的模型,它是在一个包含超过600万个帖子的数据集上进行训练的,这些帖子带有用于表示快乐、喜爱、悲伤、愤怒和恐惧的自我标注情感标签。LEIA基于一种词掩码方法,该方法在模型预训练期间增强了对情感词的学习。LEIA在三个领域内测试数据集上实现了约73的宏F1值,在一个强大的基准测试中优于其他监督和无监督方法,这表明LEIA能够跨帖子、用户和时间段进行泛化。我们还对五个不同的社交媒体和其他来源的数据集进行了领域外评估,展示了LEIA在不同媒体、数据收集方法和注释方案上的稳健性能。我们的结果表明,LEIA对愤怒、快乐和悲伤的分类能够在其训练领域之外进行泛化。LEIA可应用于未来的研究中,以便从作者的角度更好地识别文本中的情感。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验