• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度学习模型在推特数据上检测种族主义和仇外心理:卷积神经网络(CNN)、长短期记忆网络(LSTM)和双向编码器表征变换器(BERT)

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT.

作者信息

Benítez-Andrades José Alberto, González-Jiménez Álvaro, López-Brea Álvaro, Aveleira-Mata Jose, Alija-Pérez José-Manuel, García-Ordás María Teresa

机构信息

SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, León, León, Spain.

Department of Electric, Systems and Automatics Engineering, Universidad de León, Leon, León, Spain.

出版信息

PeerJ Comput Sci. 2022 Mar 1;8:e906. doi: 10.7717/peerj-cs.906. eCollection 2022.

DOI:10.7717/peerj-cs.906
PMID:35494847
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9044360/
Abstract

With the growth that social networks have experienced in recent years, it is entirely impossible to moderate content manually. Thanks to the different existing techniques in natural language processing, it is possible to generate predictive models that automatically classify texts into different categories. However, a weakness has been detected concerning the language used to train such models. This work aimed to develop a predictive model based on BERT, capable of detecting racist and xenophobic messages in tweets written in Spanish. A comparison was made with different Deep Learning models. A total of five predictive models were developed, two based on BERT and three using other deep learning techniques, CNN, LSTM and a model combining CNN + LSTM techniques. After exhaustively analyzing the results obtained by the different models, it was found that the one that got the best metrics was BETO, a BERT-based model trained only with texts written in Spanish. The results of our study show that the BETO model achieves a precision of 85.22% compared to the 82.00% precision of the mBERT model. The rest of the models obtained between 79.34% and 80.48% precision. On this basis, it has been possible to justify the vital importance of developing native transfer learning models for solving Natural Language Processing (NLP) problems in Spanish. Our main contribution is the achievement of promising results in the field of racism and hate speech in Spanish by applying different deep learning techniques.

摘要

近年来,随着社交网络的发展,人工审核内容已完全不可能。得益于自然语言处理中现有的不同技术,生成能够自动将文本分类到不同类别的预测模型成为可能。然而,已检测到用于训练此类模型的语言存在一个弱点。这项工作旨在开发一种基于BERT的预测模型,能够检测西班牙语推文中的种族主义和仇外信息。与不同的深度学习模型进行了比较。总共开发了五个预测模型,两个基于BERT,三个使用其他深度学习技术,即CNN、LSTM以及一个结合了CNN + LSTM技术的模型。在详尽分析不同模型获得的结果后,发现表现最佳的是BETO,这是一个仅使用西班牙语编写的文本进行训练的基于BERT的模型。我们的研究结果表明,与mBERT模型82.00%的精度相比,BETO模型的精度达到了85.22%。其余模型的精度在79.34%至80.48%之间。在此基础上,得以证明开发用于解决西班牙语自然语言处理(NLP)问题的原生迁移学习模型至关重要。我们的主要贡献是通过应用不同的深度学习技术,在西班牙语的种族主义和仇恨言论领域取得了有前景的成果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/64c21c59c0cb/peerj-cs-08-906-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/170157c88c18/peerj-cs-08-906-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/fcd26a0e4ffa/peerj-cs-08-906-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/309736d162c0/peerj-cs-08-906-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/b3b4e00a1cd1/peerj-cs-08-906-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/4200d86c72f2/peerj-cs-08-906-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/64c21c59c0cb/peerj-cs-08-906-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/170157c88c18/peerj-cs-08-906-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/fcd26a0e4ffa/peerj-cs-08-906-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/309736d162c0/peerj-cs-08-906-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/b3b4e00a1cd1/peerj-cs-08-906-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/4200d86c72f2/peerj-cs-08-906-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e8f/9044360/64c21c59c0cb/peerj-cs-08-906-g006.jpg

相似文献

1
Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT.使用深度学习模型在推特数据上检测种族主义和仇外心理:卷积神经网络(CNN)、长短期记忆网络(LSTM)和双向编码器表征变换器(BERT)
PeerJ Comput Sci. 2022 Mar 1;8:e906. doi: 10.7717/peerj-cs.906. eCollection 2022.
2
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
3
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
4
An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings.一种使用基于梯度优化的卷积神经网络与BERT嵌入的高效灾难推文分类方法。
MethodsX. 2024 Jul 3;13:102843. doi: 10.1016/j.mex.2024.102843. eCollection 2024 Dec.
5
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.
6
Asian hate speech detection on Twitter during COVID-19.新冠疫情期间推特上的反亚裔仇恨言论检测
Front Artif Intell. 2022 Aug 15;5:932381. doi: 10.3389/frai.2022.932381. eCollection 2022.
7
COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter.COVID-Twitter-BERT:一种用于分析推特上新冠疫情相关内容的自然语言处理模型。
Front Artif Intell. 2023 Mar 14;6:1023281. doi: 10.3389/frai.2023.1023281. eCollection 2023.
8
Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.分析迁移学习在生物医学跨语言命名实体识别和标准化中的影响。
BMC Bioinformatics. 2021 Dec 17;22(Suppl 1):601. doi: 10.1186/s12859-021-04247-9.
9
A BERT Framework to Sentiment Analysis of Tweets.一种用于推文情感分析的BERT框架。
Sensors (Basel). 2023 Jan 2;23(1):506. doi: 10.3390/s23010506.
10
A Natural Language Processing (NLP) Evaluation on COVID-19 Rumour Dataset Using Deep Learning Techniques.基于深度学习技术的 COVID-19 谣言数据集的自然语言处理 (NLP) 评估。
Comput Intell Neurosci. 2022 Sep 14;2022:6561622. doi: 10.1155/2022/6561622. eCollection 2022.

引用本文的文献

1
Sentiment Analysis of Social Media Data on Ebola Outbreak Using Deep Learning Classifiers.使用深度学习分类器对埃博拉疫情社交媒体数据进行情感分析
Life (Basel). 2024 May 30;14(6):708. doi: 10.3390/life14060708.
2
Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing.提高自闭症谱系障碍(ASD)检测准确性:一种结合机器学习、深度学习模型与自然语言处理的方法。
Health Inf Sci Syst. 2024 Mar 6;12(1):20. doi: 10.1007/s13755-024-00281-y. eCollection 2024 Dec.
3
Pashto offensive language detection: a benchmark dataset and monolingual Pashto BERT.

本文引用的文献

1
Identifying vulgarity in Bengali social media textual content.识别孟加拉语社交媒体文本内容中的低俗信息。
PeerJ Comput Sci. 2021 Oct 19;7:e665. doi: 10.7717/peerj-cs.665. eCollection 2021.
2
Transfer Learning for Classifying Spanish and English Text by Clinical Specialties.基于临床专业对西班牙语和英语文本进行分类的迁移学习
Stud Health Technol Inform. 2021 May 27;281:377-381. doi: 10.3233/SHTI210184.
3
Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA.来自巴西和美国的与新冠疫情相关的推特内容中的主题检测与情感分析。
普什图语冒犯性语言检测:一个基准数据集和单语普什图语BERT
PeerJ Comput Sci. 2023 Oct 18;9:e1617. doi: 10.7717/peerj-cs.1617. eCollection 2023.
4
Label modification and bootstrapping for zero-shot cross-lingual hate speech detection.用于零样本跨语言仇恨言论检测的标签修改与自训练
Lang Resour Eval. 2023;57(4):1515-1546. doi: 10.1007/s10579-023-09637-4. Epub 2023 Feb 18.
Appl Soft Comput. 2021 Mar;101:107057. doi: 10.1016/j.asoc.2020.107057. Epub 2020 Dec 26.
4
Detecting and Monitoring Hate Speech in Twitter.检测和监测 Twitter 中的仇恨言论。
Sensors (Basel). 2019 Oct 26;19(21):4654. doi: 10.3390/s19214654.
5
Ethical issues in qualitative research on internet communities.互联网社区定性研究中的伦理问题。
BMJ. 2001 Nov 10;323(7321):1103-5. doi: 10.1136/bmj.323.7321.1103.