基于卷积神经网络和机器学习算法的仇恨言论检测：推特言论。

Detection of hate: speech tweets based convolutional neural network and machine learning algorithms.

机构信息

Department of Mathematics, Faculty of Science, Aswan University, Aswân, Egypt.

Electrical Engineering Department, Faculty of Energy Engineering, Aswan University, Aswân, Egypt.

出版信息

Sci Rep. 2024 Nov 21;14(1):28870. doi: 10.1038/s41598-024-76632-2.

DOI:10.1038/s41598-024-76632-2

PMID:39572576

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11582668/

Abstract

There is no doubt that social media sites have provided many benefits to humanity, such as sharing information continuously and communicating with others easily. It also seems that social media sites have many advantages, but in addition to these advantages, there are disadvantages that we always strive to find a solution. One of these disadvantages is sharing hate speech. In our study, we're discussing a way to solve this phenomenon by using Term Frequency-Inverse Document Frequency (TF-IDF) based approach to feature engineering on eleven classifiers for machine and deep learning that can automatically identify hate speech. Three different databases were used, the first of which "Hate speech offensive tweets by Davidson et al.", the second called "Twitter hate speech" and finally we merged the second data with (Cyberbullying dataset (toxicity_parsed_dataset)". The classifiers involved are Logistic Regression (LR), Naive Bayes (NB), Multi-layer Perceptron (MLP), and Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), K-Means, Decision Tree (DT), Gradient Boosting classifier (GBC), and the Extra Trees (ET) in addition to the convolutional neural network (CNN). Maximum accuracy was attained, which exceeded 99%.

摘要

毫无疑问，社交媒体为人类提供了许多好处，例如持续分享信息和轻松与他人交流。社交媒体似乎也有很多优势，但除了这些优势，还有一些劣势，我们一直在努力寻找解决方案。其中一个缺点是分享仇恨言论。在我们的研究中，我们正在讨论一种通过使用基于词频-逆文档频率（TF-IDF）的方法来解决这个问题，该方法对十一种机器学习和深度学习分类器进行特征工程，这些分类器可以自动识别仇恨言论。我们使用了三个不同的数据库，第一个是“Davidson 等人的仇恨言论攻击性推文”，第二个是“Twitter 仇恨言论”，最后我们将第二个数据与（Cyberbullying dataset (toxicity_parsed_dataset) 合并。涉及的分类器包括逻辑回归（LR）、朴素贝叶斯（NB）、多层感知机（MLP）和支持向量机（SVM）、随机森林（RF）、K-最近邻（KNN）、K-均值、决策树（DT）、梯度提升分类器（GBC）和 Extra Trees（ET），以及卷积神经网络（CNN）。最高准确率达到了 99%以上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a6a/11582668/4519cf7d3677/41598_2024_76632_Fig1_HTML.jpg

相似文献

Detection of hate: speech tweets based convolutional neural network and machine learning algorithms.基于卷积神经网络和机器学习算法的仇恨言论检测：推特言论。

Sci Rep. 2024 Nov 21;14(1):28870. doi: 10.1038/s41598-024-76632-2.

Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.检测阿拉伯地区与 COVID-19 相关推文的仇恨言论：深度学习和主题建模方法。

J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.

Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model.通过使用集成深度学习模型分析针对移民和妇女的仇恨言论。

Comput Intell Neurosci. 2022 Apr 10;2022:8153791. doi: 10.1155/2022/8153791. eCollection 2022.

Machine learning and deep learning-based approach to categorize Bengali comments on social networks using fused dataset.基于机器学习和深度学习的方法，使用融合数据集对孟加拉语社交网络评论进行分类。

PLoS One. 2024 Oct 3;19(10):e0308862. doi: 10.1371/journal.pone.0308862. eCollection 2024.

Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization.基于混合机器学习模型和超参数优化的罗马 Urdu 仇恨言论检测

Sci Rep. 2024 Nov 19;14(1):28590. doi: 10.1038/s41598-024-79106-7.

Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques.利用机器学习和深度学习技术开发一种用于冠心病预测的高效新方法。

Technol Health Care. 2024;32(6):4545-4569. doi: 10.3233/THC-240740.

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.机器学习分类器在电子烟 Twitter 监测中的应用：比较机器学习研究。

J Med Internet Res. 2020 Aug 12;22(8):e17478. doi: 10.2196/17478.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech.机器学习算法和特征集在语音自动情感识别中的比较

Sensors (Basel). 2022 Oct 6;22(19):7561. doi: 10.3390/s22197561.

HAPI: An efficient Hybrid Feature Engineering-based Approach for Propaganda Identification in social media.HAPI：一种基于高效混合特征工程的社交媒体中宣传识别方法。

PLoS One. 2024 Jul 10;19(7):e0302583. doi: 10.1371/journal.pone.0302583. eCollection 2024.

本文引用的文献

Analyzing hate speech dynamics on Twitter/X: Insights from conversational data and the impact of user interaction patterns.分析推特/X上的仇恨言论动态：来自对话数据的见解以及用户互动模式的影响。

Heliyon. 2024 May 31;10(11):e32246. doi: 10.1016/j.heliyon.2024.e32246. eCollection 2024 Jun 15.

Threatening language detection from Urdu data with deep sequential model.基于深度序列模型的乌尔都语威胁性语言检测。

PLoS One. 2024 Jun 6;19(6):e0290915. doi: 10.1371/journal.pone.0290915. eCollection 2024.

Towards generalisable hate speech detection: a review on obstacles and solutions.迈向可推广的仇恨言论检测：关于障碍与解决方案的综述

PeerJ Comput Sci. 2021 Jun 17;7:e598. doi: 10.7717/peerj-cs.598. eCollection 2021.

Directions in abusive language training data, a systematic review: Garbage in, garbage out.在辱骂性语言训练数据的方向上，一项系统评价：垃圾进，垃圾出。

PLoS One. 2020 Dec 28;15(12):e0243300. doi: 10.1371/journal.pone.0243300. eCollection 2020.

Thromb Haemost. 2020 Jun;120(6):998-1000. doi: 10.1055/s-0040-1710018. Epub 2020 Apr 21.

Discovery of [1,2,3]Triazolo[4,5-]pyrimidine Derivatives as Novel LSD1 Inhibitors.发现[1,2,3]三唑并[4,5-]嘧啶衍生物作为新型赖氨酸特异性去甲基化酶1（LSD1）抑制剂

ACS Med Chem Lett. 2017 Mar 6;8(4):384-389. doi: 10.1021/acsmedchemlett.6b00423. eCollection 2017 Apr 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于卷积神经网络和机器学习算法的仇恨言论检测：推特言论。

Detection of hate: speech tweets based convolutional neural network and machine learning algorithms.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献