Suppr超能文献

基于卷积神经网络和机器学习算法的仇恨言论检测:推特言论。

Detection of hate: speech tweets based convolutional neural network and machine learning algorithms.

机构信息

Department of Mathematics, Faculty of Science, Aswan University, Aswân, Egypt.

Electrical Engineering Department, Faculty of Energy Engineering, Aswan University, Aswân, Egypt.

出版信息

Sci Rep. 2024 Nov 21;14(1):28870. doi: 10.1038/s41598-024-76632-2.

Abstract

There is no doubt that social media sites have provided many benefits to humanity, such as sharing information continuously and communicating with others easily. It also seems that social media sites have many advantages, but in addition to these advantages, there are disadvantages that we always strive to find a solution. One of these disadvantages is sharing hate speech. In our study, we're discussing a way to solve this phenomenon by using Term Frequency-Inverse Document Frequency (TF-IDF) based approach to feature engineering on eleven classifiers for machine and deep learning that can automatically identify hate speech. Three different databases were used, the first of which "Hate speech offensive tweets by Davidson et al.", the second called "Twitter hate speech" and finally we merged the second data with (Cyberbullying dataset (toxicity_parsed_dataset)". The classifiers involved are Logistic Regression (LR), Naive Bayes (NB), Multi-layer Perceptron (MLP), and Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), K-Means, Decision Tree (DT), Gradient Boosting classifier (GBC), and the Extra Trees (ET) in addition to the convolutional neural network (CNN). Maximum accuracy was attained, which exceeded 99%.

摘要

毫无疑问,社交媒体为人类提供了许多好处,例如持续分享信息和轻松与他人交流。社交媒体似乎也有很多优势,但除了这些优势,还有一些劣势,我们一直在努力寻找解决方案。其中一个缺点是分享仇恨言论。在我们的研究中,我们正在讨论一种通过使用基于词频-逆文档频率(TF-IDF)的方法来解决这个问题,该方法对十一种机器学习和深度学习分类器进行特征工程,这些分类器可以自动识别仇恨言论。我们使用了三个不同的数据库,第一个是“Davidson 等人的仇恨言论攻击性推文”,第二个是“Twitter 仇恨言论”,最后我们将第二个数据与(Cyberbullying dataset (toxicity_parsed_dataset) 合并。涉及的分类器包括逻辑回归(LR)、朴素贝叶斯(NB)、多层感知机(MLP)和支持向量机(SVM)、随机森林(RF)、K-最近邻(KNN)、K-均值、决策树(DT)、梯度提升分类器(GBC)和 Extra Trees(ET),以及卷积神经网络(CNN)。最高准确率达到了 99%以上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a6a/11582668/4519cf7d3677/41598_2024_76632_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验