文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Department of Business Informatics, Graduate School of Business, National Research University Higher School of Economics, Russia.

PeerJ Comput Sci. 2022 Jul 19;8:e1039. doi: 10.7717/peerj-cs.1039. eCollection 2022.

The Russian language is still not as well-resourced as English, especially in the field of sentiment analysis of Twitter content. Though several sentiment analysis datasets of tweets in Russia exist, they all are either automatically annotated or manually annotated by one annotator. Thus, there is no inter-annotator agreement, or annotation may be focused on a specific domain. In this article, we present RuSentiTweet, a new sentiment analysis dataset of general domain tweets in Russian. RuSentiTweet is currently the largest in its class for Russian, with 13,392 tweets manually annotated with moderate inter-rater agreement into five classes: Positive, Neutral, Negative, Speech Act, and Skip. As a source of data, we used Twitter Stream Grab, a historical collection of tweets obtained from the general Twitter API stream, which provides a 1% sample of the public tweets. Additionally, we released a RuBERT-based sentiment classification model that achieved = 0.6594 on the test subset.

俄语的资源仍然不如英语丰富，尤其是在推特内容情感分析领域。尽管存在一些俄罗斯推文的情感分析数据集，但它们都是由一个注释者自动注释或手动注释的。因此，不存在注释者间的一致性，或者注释可能集中在特定领域。在本文中，我们展示了RuSentiTweet，这是一个新的俄语通用领域推文情感分析数据集。RuSentiTweet目前是俄语同类数据集中最大的，有13392条推文被手动注释，注释者间一致性适中，分为五类：积极、中性、消极、言语行为和跳过。作为数据来源，我们使用了Twitter Stream Grab，这是一个从通用推特应用程序编程接口流中获取的推文历史集合，它提供了1%的公开推文样本。此外，我们发布了一个基于RuBERT的情感分类模型，该模型在测试子集中的F1值为0.6594。

Department of Business Informatics, Graduate School of Business, National Research University Higher School of Economics, Russia.

PeerJ Comput Sci. 2022 Jul 19;8:e1039. doi: 10.7717/peerj-cs.1039. eCollection 2022.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

RuSentiTweet：一个俄语通用领域推文的情感分析数据集。

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

RuSentiTweet：一个俄语通用领域推文的情感分析数据集。

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献