King Saud University, Riyadh, Saudi Arabia.
J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.
The massive scale of social media platforms requires an automatic solution for detecting hate speech. These automatic solutions will help reduce the need for manual analysis of content. Most previous literature has cast the hate speech detection problem as a supervised text classification task using classical machine learning methods or, more recently, deep learning methods. However, work investigating this problem in Arabic cyberspace is still limited compared to the published work on English text.
This study aims to identify hate speech related to the COVID-19 pandemic posted by Twitter users in the Arab region and to discover the main issues discussed in tweets containing hate speech.
We used the ArCOV-19 dataset, an ongoing collection of Arabic tweets related to COVID-19, starting from January 27, 2020. Tweets were analyzed for hate speech using a pretrained convolutional neural network (CNN) model; each tweet was given a score between 0 and 1, with 1 being the most hateful text. We also used nonnegative matrix factorization to discover the main issues and topics discussed in hate tweets.
The analysis of hate speech in Twitter data in the Arab region identified that the number of non-hate tweets greatly exceeded the number of hate tweets, where the percentage of hate tweets among COVID-19 related tweets was 3.2% (11,743/547,554). The analysis also revealed that the majority of hate tweets (8385/11,743, 71.4%) contained a low level of hate based on the score provided by the CNN. This study identified Saudi Arabia as the Arab country from which the most COVID-19 hate tweets originated during the pandemic. Furthermore, we showed that the largest number of hate tweets appeared during the time period of March 1-30, 2020, representing 51.9% of all hate tweets (6095/11,743). Contrary to what was anticipated, in the Arab region, it was found that the spread of COVID-19-related hate speech on Twitter was weakly related with the dissemination of the pandemic based on the Pearson correlation coefficient (r=0.1982, P=.50). The study also identified the commonly discussed topics in hate tweets during the pandemic. Analysis of the 7 extracted topics showed that 6 of the 7 identified topics were related to hate speech against China and Iran. Arab users also discussed topics related to political conflicts in the Arab region during the COVID-19 pandemic.
The COVID-19 pandemic poses serious public health challenges to nations worldwide. During the COVID-19 pandemic, frequent use of social media can contribute to the spread of hate speech. Hate speech on the web can have a negative impact on society, and hate speech may have a direct correlation with real hate crimes, which increases the threat associated with being targeted by hate speech and abusive language. This study is the first to analyze hate speech in the context of Arabic COVID-19-related tweets in the Arab region.
社交媒体平台的大规模需要自动解决方案来检测仇恨言论。这些自动解决方案将有助于减少对内容的手动分析的需求。大多数先前的文献将仇恨言论检测问题视为使用经典机器学习方法或最近的深度学习方法的监督文本分类任务。然而,与已发表的关于英语文本的工作相比,在阿拉伯网络空间调查这一问题的工作仍然有限。
本研究旨在识别与阿拉伯地区推特用户发布的与 COVID-19 相关的仇恨言论,并发现仇恨言论中讨论的主要问题。
我们使用了 ArCOV-19 数据集,这是一个从 2020 年 1 月 27 日开始收集的与 COVID-19 相关的阿拉伯推文的持续数据集。使用预先训练的卷积神经网络(CNN)模型对推文进行仇恨言论分析;每个推文的得分在 0 到 1 之间,1 表示最具仇恨性的文本。我们还使用非负矩阵分解来发现仇恨推文中讨论的主要问题和主题。
对阿拉伯地区推特数据中的仇恨言论分析表明,非仇恨推文的数量大大超过仇恨推文的数量,在与 COVID-19 相关的推文中,仇恨推文的比例为 3.2%(11743/547554)。分析还显示,大多数仇恨推文(8385/11743,71.4%)基于 CNN 提供的分数,其仇恨程度较低。本研究确定沙特阿拉伯是阿拉伯国家中 COVID-19 仇恨推文数量最多的国家。此外,我们表明,在 2020 年 3 月 1 日至 30 日期间,出现了最多的仇恨推文,占所有仇恨推文的 51.9%(6095/11743)。与预期相反,在阿拉伯地区,根据 Pearson 相关系数(r=0.1982,P=.50),发现推特上与 COVID-19 相关的仇恨言论传播与大流行的传播关系较弱。该研究还确定了大流行期间仇恨推文中讨论的常见主题。对提取的 7 个主题的分析表明,7 个已识别主题中有 6 个与针对中国和伊朗的仇恨言论有关。阿拉伯用户还讨论了 COVID-19 期间阿拉伯地区政治冲突的相关主题。
COVID-19 大流行对世界各国的公共卫生构成了严重挑战。在 COVID-19 大流行期间,频繁使用社交媒体可能会导致仇恨言论的传播。网络仇恨言论会对社会产生负面影响,仇恨言论可能与真实的仇恨犯罪直接相关,这增加了仇恨言论和辱骂性语言的目标受到攻击的威胁。本研究是首次分析阿拉伯地区与阿拉伯 COVID-19 相关的推文的背景下的仇恨言论。