使用深度学习模型在推特数据上检测种族主义和仇外心理：卷积神经网络（CNN）、长短期记忆网络（LSTM）和双向编码器表征变换器（BERT）

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT.

作者信息

Benítez-Andrades José Alberto, González-Jiménez Álvaro, López-Brea Álvaro, Aveleira-Mata Jose, Alija-Pérez José-Manuel, García-Ordás María Teresa

机构信息

SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, León, León, Spain.

Department of Electric, Systems and Automatics Engineering, Universidad de León, Leon, León, Spain.

出版信息

PeerJ Comput Sci. 2022 Mar 1;8:e906. doi: 10.7717/peerj-cs.906. eCollection 2022.

DOI:10.7717/peerj-cs.906

PMID:35494847

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9044360/

Abstract

With the growth that social networks have experienced in recent years, it is entirely impossible to moderate content manually. Thanks to the different existing techniques in natural language processing, it is possible to generate predictive models that automatically classify texts into different categories. However, a weakness has been detected concerning the language used to train such models. This work aimed to develop a predictive model based on BERT, capable of detecting racist and xenophobic messages in tweets written in Spanish. A comparison was made with different Deep Learning models. A total of five predictive models were developed, two based on BERT and three using other deep learning techniques, CNN, LSTM and a model combining CNN + LSTM techniques. After exhaustively analyzing the results obtained by the different models, it was found that the one that got the best metrics was BETO, a BERT-based model trained only with texts written in Spanish. The results of our study show that the BETO model achieves a precision of 85.22% compared to the 82.00% precision of the mBERT model. The rest of the models obtained between 79.34% and 80.48% precision. On this basis, it has been possible to justify the vital importance of developing native transfer learning models for solving Natural Language Processing (NLP) problems in Spanish. Our main contribution is the achievement of promising results in the field of racism and hate speech in Spanish by applying different deep learning techniques.

摘要

近年来，随着社交网络的发展，人工审核内容已完全不可能。得益于自然语言处理中现有的不同技术，生成能够自动将文本分类到不同类别的预测模型成为可能。然而，已检测到用于训练此类模型的语言存在一个弱点。这项工作旨在开发一种基于BERT的预测模型，能够检测西班牙语推文中的种族主义和仇外信息。与不同的深度学习模型进行了比较。总共开发了五个预测模型，两个基于BERT，三个使用其他深度学习技术，即CNN、LSTM以及一个结合了CNN + LSTM技术的模型。在详尽分析不同模型获得的结果后，发现表现最佳的是BETO，这是一个仅使用西班牙语编写的文本进行训练的基于BERT的模型。我们的研究结果表明，与mBERT模型82.00%的精度相比，BETO模型的精度达到了85.22%。其余模型的精度在79.34%至80.48%之间。在此基础上，得以证明开发用于解决西班牙语自然语言处理（NLP）问题的原生迁移学习模型至关重要。我们的主要贡献是通过应用不同的深度学习技术，在西班牙语的种族主义和仇恨言论领域取得了有前景的成果。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用深度学习模型在推特数据上检测种族主义和仇外心理：卷积神经网络（CNN）、长短期记忆网络（LSTM）和双向编码器表征变换器（BERT）

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

使用深度学习模型在推特数据上检测种族主义和仇外心理：卷积神经网络（CNN）、长短期记忆网络（LSTM）和双向编码器表征变换器（BERT）

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献