Duzha Armend, Casadei Cristiano, Tosi Michael, Celli Fabio
Maggioli S.p.A, Via Bornaccino 101, 47822 Santarcangelo di Romagna, Italy.
SN Soc Sci. 2021;1(9):223. doi: 10.1007/s43545-021-00234-2. Epub 2021 Aug 20.
Accurate detection of hate speech against politicians, policy making and political ideas is crucial to maintain democracy and free speech. Unfortunately, the amount of labelled data necessary for training models to detect hate speech are limited and domain-dependent. In this paper, we address the issue of classification of hate speech against policy makers from Twitter in Italian, producing the first resource of this type in this language. We collected and annotated 1264 tweets, examined the cases of disagreements between annotators, and performed in-domain and cross-domain hate speech classifications with different features and algorithms. We achieved a performance of ROC AUC 0.83 and analyzed the most predictive attributes, also finding the different language features in the anti-policymakers and anti-immigration domains. Finally, we visualized networks of hashtags to capture the topics used in hateful and normal tweets.
准确检测针对政治家、政策制定和政治理念的仇恨言论对于维护民主和言论自由至关重要。不幸的是,训练模型以检测仇恨言论所需的标注数据量有限且依赖于领域。在本文中,我们解决了意大利语推特上针对政策制定者的仇恨言论分类问题,生成了该语言下的首个此类资源。我们收集并标注了1264条推文,检查了标注者之间的分歧情况,并使用不同的特征和算法进行了领域内和跨领域的仇恨言论分类。我们取得了ROC AUC为0.83的性能,并分析了最具预测性的属性,还发现了反政策制定者和反移民领域中不同的语言特征。最后,我们可视化了主题标签网络,以捕捉仇恨推文和正常推文中使用的主题。