Department of Translation, Interpreting and Communication - Faculty of Arts and Philosophy, Ghent University, Ghent, Belgium.
Department of Linguistics - Faculty of Arts, University of Antwerp, Antwerp, Belgium.
PLoS One. 2018 Oct 8;13(10):e0203794. doi: 10.1371/journal.pone.0203794. eCollection 2018.
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.
虽然社交媒体提供了很好的交流机会,但它们也增加了年轻人在网上受到威胁的脆弱性。最近的研究报告显示,网络欺凌在年轻人中构成了一个日益严重的问题。成功的预防取决于对潜在有害信息的充分检测,而网络上的信息过载要求智能系统自动识别潜在风险。本文的重点是通过对网络欺凌者、受害者和网络欺凌旁观者发布的帖子进行建模,实现社交媒体文本的自动网络欺凌检测。我们描述了一个英语和荷兰语的网络欺凌语料库的收集和细粒度标注,并进行了一系列二进制分类实验,以确定自动网络欺凌检测的可行性。我们利用线性支持向量机利用丰富的特征集,并研究哪些信息源对任务贡献最大。在预留测试集上的实验结果表明,对与网络欺凌相关的帖子的检测具有很大的潜力。在优化超参数后,该分类器在英语和荷兰语上的 F1 分数分别为 64%和 61%,明显优于基线系统。