Computer Science, Innovation Centre, University of Exeter, Exeter, United Kingdom.
PLoS One. 2024 Apr 18;19(4):e0299490. doi: 10.1371/journal.pone.0299490. eCollection 2024.
Researchers commonly perform sentiment analysis on large collections of short texts like tweets, Reddit posts or newspaper headlines that are all focused on a specific topic, theme or event. Usually, general-purpose sentiment analysis methods are used. These perform well on average but miss the variation in meaning that happens across different contexts, for example, the word "active" has a very different intention and valence in the phrase "active lifestyle" versus "active volcano". This work presents a new approach, CIDER (Context Informed Dictionary and sEmantic Reasoner), which performs context-sensitive linguistic analysis, where the valence of sentiment-laden terms is inferred from the whole corpus before being used to score the individual texts. In this paper, we detail the CIDER algorithm and demonstrate that it outperforms state-of-the-art generalist unsupervised sentiment analysis techniques on a large collection of tweets about the weather. CIDER is also applicable to alternative (non-sentiment) linguistic scales. A case study on gender in the UK is presented, with the identification of highly gendered and sentiment-laden days. We have made our implementation of CIDER available as a Python package: https://pypi.org/project/ciderpolarity/.
研究人员通常会对大量聚焦于特定主题、主题或事件的短文本(如推文、Reddit 帖子或报纸标题)进行情感分析。通常,会使用通用的情感分析方法。这些方法的平均表现良好,但会错过跨不同上下文发生的意义变化,例如,在短语“active lifestyle”(积极的生活方式)和“active volcano”(活火山)中,“active”这个词的意图和情感色彩有很大的不同。这项工作提出了一种新方法 CIDER(上下文感知词典和语义推理器),它执行上下文敏感的语言分析,在将带有情感色彩的术语的情感得分用于评分之前,从整个语料库中推断出它们的情感色彩。在本文中,我们详细介绍了 CIDER 算法,并证明它在关于天气的大量推文中表现优于最先进的通用无监督情感分析技术。CIDER 也适用于替代(非情感)语言尺度。我们展示了一个关于英国性别问题的案例研究,确定了高度性别化和情感化的日子。我们已经将 CIDER 的实现作为 Python 包提供:https://pypi.org/project/ciderpolarity/。