• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究

Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.

机构信息

Department of Computer Science, Islamia College Peshawar, Peshawar 25130, Pakistan.

Malaysian Institute of Information Technology, Universiti Kuala Lumpur, Kuala Lumpur 50250, Malaysia.

出版信息

Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.

DOI:10.3390/s23083909
PMID:37112249
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10143294/
Abstract

Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate speech may result in hate crimes, cyber violence, and substantial harm to cyberspace, physical security, and social safety. As a result, hate speech detection is a critical issue for both cyberspace and physical society, necessitating the development of a robust application capable of detecting and combating it in real-time. Hate speech detection is a context-dependent problem that requires context-aware mechanisms for resolution. In this study, we employed a transformer-based model for Roman Urdu hate speech classification due to its ability to capture the text context. In addition, we developed the first Roman Urdu pre-trained BERT model, which we named BERT-RU. For this purpose, we exploited the capabilities of BERT by training it from scratch on the largest Roman Urdu dataset consisting of 173,714 text messages. Traditional and deep learning models were used as baseline models, including LSTM, BiLSTM, BiLSTM + Attention Layer, and CNN. We also investigated the concept of transfer learning by using pre-trained BERT embeddings in conjunction with deep learning models. The performance of each model was evaluated in terms of accuracy, precision, recall, and F-measure. The generalization of each model was evaluated on a cross-domain dataset. The experimental results revealed that the transformer-based model, when directly applied to the classification task of the Roman Urdu hate speech, outperformed traditional machine learning, deep learning models, and pre-trained transformer-based models in terms of accuracy, precision, recall, and F-measure, with scores of 96.70%, 97.25%, 96.74%, and 97.89%, respectively. In addition, the transformer-based model exhibited superior generalization on a cross-domain dataset.

摘要

社交媒体应用程序,如 Twitter 和 Facebook,允许用户在全球范围内进行交流和分享他们的想法、状态更新、意见、照片和视频。不幸的是,有些人利用这些平台传播仇恨言论和辱骂性语言。仇恨言论的增长可能导致仇恨犯罪、网络暴力和对网络空间、人身安全和社会安全的重大伤害。因此,仇恨言论检测对于网络空间和现实社会都是一个至关重要的问题,需要开发一种强大的应用程序,能够实时检测和打击仇恨言论。仇恨言论检测是一个依赖上下文的问题,需要上下文感知机制来解决。在这项研究中,我们使用了基于转换器的模型来进行罗马乌尔都语仇恨言论分类,因为它能够捕捉文本上下文。此外,我们还开发了第一个罗马乌尔都语预训练 BERT 模型,我们称之为 BERT-RU。为此,我们利用 BERT 的能力,在最大的罗马乌尔都语数据集上从零开始训练它,该数据集包含 173714 条短信。传统的和深度学习模型被用作基线模型,包括 LSTM、BiLSTM、BiLSTM+Attention Layer 和 CNN。我们还研究了迁移学习的概念,即在深度学习模型中使用预训练的 BERT 嵌入。我们根据准确性、精度、召回率和 F1 分数来评估每个模型的性能。我们还在一个跨领域数据集上评估了每个模型的泛化能力。实验结果表明,基于转换器的模型在直接应用于罗马乌尔都语仇恨言论分类任务时,在准确性、精度、召回率和 F1 分数方面都优于传统的机器学习、深度学习模型和预训练的转换器模型,分别达到了 96.70%、97.25%、96.74%和 97.89%。此外,基于转换器的模型在跨领域数据集上表现出了更好的泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/17fbfef418c6/sensors-23-03909-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/442697210708/sensors-23-03909-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/3c0b2dad7dab/sensors-23-03909-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/1b5584dcf462/sensors-23-03909-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/32f45857e813/sensors-23-03909-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/f36860be8a5d/sensors-23-03909-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/59a47c5b9b1b/sensors-23-03909-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/ac699f61c230/sensors-23-03909-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/791347e499b6/sensors-23-03909-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/201f697181a5/sensors-23-03909-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/4899f0824576/sensors-23-03909-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/8477bb698f83/sensors-23-03909-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/8661b715aa45/sensors-23-03909-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/0cb16c7ea411/sensors-23-03909-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/4dbb1eeb8151/sensors-23-03909-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/073351ab2573/sensors-23-03909-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/aac756077f6e/sensors-23-03909-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/9319f7374149/sensors-23-03909-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/0c5279395d9d/sensors-23-03909-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/5a16bc608dd7/sensors-23-03909-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/d72686cadbfc/sensors-23-03909-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/17fbfef418c6/sensors-23-03909-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/442697210708/sensors-23-03909-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/3c0b2dad7dab/sensors-23-03909-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/1b5584dcf462/sensors-23-03909-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/32f45857e813/sensors-23-03909-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/f36860be8a5d/sensors-23-03909-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/59a47c5b9b1b/sensors-23-03909-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/ac699f61c230/sensors-23-03909-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/791347e499b6/sensors-23-03909-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/201f697181a5/sensors-23-03909-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/4899f0824576/sensors-23-03909-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/8477bb698f83/sensors-23-03909-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/8661b715aa45/sensors-23-03909-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/0cb16c7ea411/sensors-23-03909-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/4dbb1eeb8151/sensors-23-03909-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/073351ab2573/sensors-23-03909-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/aac756077f6e/sensors-23-03909-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/9319f7374149/sensors-23-03909-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/0c5279395d9d/sensors-23-03909-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/5a16bc608dd7/sensors-23-03909-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/d72686cadbfc/sensors-23-03909-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/169b/10143294/17fbfef418c6/sensors-23-03909-g021.jpg

相似文献

1
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
2
Cyberbullying detection: advanced preprocessing techniques & deep learning architecture for Roman Urdu data.网络欺凌检测:针对乌尔都语数据的先进预处理技术与深度学习架构
J Big Data. 2021;8(1):160. doi: 10.1186/s40537-021-00550-7. Epub 2021 Dec 22.
3
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.检测阿拉伯地区与 COVID-19 相关推文的仇恨言论:深度学习和主题建模方法。
J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.
4
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
5
Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT.使用深度学习模型在推特数据上检测种族主义和仇外心理:卷积神经网络(CNN)、长短期记忆网络(LSTM)和双向编码器表征变换器(BERT)
PeerJ Comput Sci. 2022 Mar 1;8:e906. doi: 10.7717/peerj-cs.906. eCollection 2022.
6
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.
7
Multi-label emotion classification of Urdu tweets.乌尔都语推文的多标签情感分类
PeerJ Comput Sci. 2022 Apr 22;8:e896. doi: 10.7717/peerj-cs.896. eCollection 2022.
8
Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.基于语音比较预训练模型和基于特征的模型对阿尔茨海默病的预测
Front Aging Neurosci. 2021 Apr 27;13:635945. doi: 10.3389/fnagi.2021.635945. eCollection 2021.
9
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
10
Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model.通过使用集成深度学习模型分析针对移民和妇女的仇恨言论。
Comput Intell Neurosci. 2022 Apr 10;2022:8153791. doi: 10.1155/2022/8153791. eCollection 2022.

引用本文的文献

1
Multimodal hate speech detection: a novel deep learning framework for multilingual text and images.多模态仇恨言论检测:一种用于多语言文本和图像的新型深度学习框架。
PeerJ Comput Sci. 2025 Apr 16;11:e2801. doi: 10.7717/peerj-cs.2801. eCollection 2025.

本文引用的文献

1
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.
2
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
3
Hate speech detection: Challenges and solutions.仇恨言论检测:挑战与解决方案。
PLoS One. 2019 Aug 20;14(8):e0221152. doi: 10.1371/journal.pone.0221152. eCollection 2019.