Khan Sahrish, Abbasi Rabeeh Ayaz, Sindhu Muddassar Azam, Arafat Sachi, Khattak Akmal Saeed, Daud Ali, Mushtaq Mubashar
Department of Computer Science, Quaid-i-Azam University, Islamabad, Pakistan.
Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK.
Heliyon. 2024 Nov 26;10(23):e40611. doi: 10.1016/j.heliyon.2024.e40611. eCollection 2024 Dec 15.
Hate speech constitutes a major problem on microblogging platforms, with automatic detection being a growing research area. Most existing works focus on analyzing the content of social media posts. Our study shifts focus to predicting which users are likely to become targets of hate speech. This paper proposes a novel Hate-speech Target Prediction Framework (HTPK) and introduces a new Hate Speech Target Dataset (HSTD), which contains tweets labeled for targets and non-targets of hate speech. Using a combination of Term Frequency-Inverse Document Frequency (TFIDF), N-grams, and Part-of-Speech (PoS) tags, we tested various machine learning algorithms, Naïve Bayes (NB) classifier performs best with an accuracy of 93%, significantly outperforming other algorithms. This research identifies the optimal combination of features for predicting hate speech targets and compares various machine learning algorithms, providing a foundation for more proactive hate speech mitigation on social media platforms.
仇恨言论是微博平台上的一个主要问题,自动检测是一个不断发展的研究领域。大多数现有工作都集中在分析社交媒体帖子的内容上。我们的研究将重点转移到预测哪些用户可能成为仇恨言论的目标。本文提出了一种新颖的仇恨言论目标预测框架(HTPK),并引入了一个新的仇恨言论目标数据集(HSTD),该数据集包含针对仇恨言论目标和非目标进行标记的推文。通过结合词频-逆文档频率(TFIDF)、N-gram和词性(PoS)标签,我们测试了各种机器学习算法,朴素贝叶斯(NB)分类器表现最佳,准确率达到93%,显著优于其他算法。这项研究确定了用于预测仇恨言论目标的最佳特征组合,并比较了各种机器学习算法,为在社交媒体平台上更积极地减轻仇恨言论提供了基础。