• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习和词汇方法在在线讨论中毒性程度检测中的文本处理。

Machine Learning and Lexicon Approach to Texts Processing in the Detection of Degrees of Toxicity in Online Discussions.

机构信息

Department of Cybernetics and Artificial Intelligence, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 04200 Kosice, Slovakia.

出版信息

Sensors (Basel). 2022 Aug 27;22(17):6468. doi: 10.3390/s22176468.

DOI:10.3390/s22176468
PMID:36080927
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9459955/
Abstract

This article focuses on the problem of detecting toxicity in online discussions. Toxicity is currently a serious problem when people are largely influenced by opinions on social networks. We offer a solution based on classification models using machine learning methods to classify short texts on social networks into multiple degrees of toxicity. The classification models used both classic methods of machine learning, such as naïve Bayes and SVM (support vector machine) as well ensemble methods, such as bagging and RF (random forest). The models were created using text data, which we extracted from social networks in the Slovak language. The labelling of our dataset of short texts into multiple classes-the degrees of toxicity-was provided automatically by our method based on the lexicon approach to texts processing. This lexicon method required creating a dictionary of toxic words in the Slovak language, which is another contribution of the work. Finally, an application was created based on the learned machine learning models, which can be used to detect the degree of toxicity of new social network comments as well as for experimentation with various machine learning methods. We achieved the best results using an SVM-average value of accuracy = 0.89 and F1 = 0.79. This model also outperformed the ensemble learning by the RF and Bagging methods; however, the ensemble learning methods achieved better results than the naïve Bayes method.

摘要

这篇文章主要关注在线讨论中检测毒性的问题。毒性是当前人们在很大程度上受到社交网络上的意见影响时面临的一个严重问题。我们提供了一种基于分类模型的解决方案,使用机器学习方法将社交网络上的短文本分类为多个毒性程度。所使用的分类模型既包括经典的机器学习方法,如朴素贝叶斯和支持向量机(SVM),也包括集成方法,如袋装和随机森林(RF)。这些模型是使用从斯洛伐克语社交网络中提取的文本数据创建的。我们的短文本数据集的标签分为多个类别-毒性程度-是由我们基于词汇方法的文本处理方法自动提供的。这种词汇方法需要创建一个斯洛伐克语有毒词汇的字典,这也是这项工作的另一个贡献。最后,根据学习到的机器学习模型创建了一个应用程序,该应用程序可用于检测新的社交网络评论的毒性程度,以及用于各种机器学习方法的实验。我们使用 SVM 获得了最佳结果-平均准确率=0.89,F1=0.79。该模型还优于 RF 和 Bagging 集成学习方法;但是,集成学习方法的效果优于朴素贝叶斯方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/47b42454a891/sensors-22-06468-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/3bc281424fd8/sensors-22-06468-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/78d5ff9e685d/sensors-22-06468-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/aa6e93f4a93d/sensors-22-06468-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/c5ad65139408/sensors-22-06468-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/5607dde435b4/sensors-22-06468-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/1dc707c0d210/sensors-22-06468-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/77a44d096e93/sensors-22-06468-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/eb5a9169fdf5/sensors-22-06468-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/47b42454a891/sensors-22-06468-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/3bc281424fd8/sensors-22-06468-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/78d5ff9e685d/sensors-22-06468-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/aa6e93f4a93d/sensors-22-06468-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/c5ad65139408/sensors-22-06468-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/5607dde435b4/sensors-22-06468-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/1dc707c0d210/sensors-22-06468-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/77a44d096e93/sensors-22-06468-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/eb5a9169fdf5/sensors-22-06468-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/9459955/47b42454a891/sensors-22-06468-g009.jpg

相似文献

1
Machine Learning and Lexicon Approach to Texts Processing in the Detection of Degrees of Toxicity in Online Discussions.机器学习和词汇方法在在线讨论中毒性程度检测中的文本处理。
Sensors (Basel). 2022 Aug 27;22(17):6468. doi: 10.3390/s22176468.
2
Comparison of Machine Learning and Sentiment Analysis in Detection of Suspicious Online Reviewers on Different Type of Data.机器学习和情感分析在不同类型数据中检测可疑在线评论者的比较。
Sensors (Basel). 2021 Dec 27;22(1):155. doi: 10.3390/s22010155.
3
Detection of emotion by text analysis using machine learning.利用机器学习通过文本分析进行情感检测。
Front Psychol. 2023 Sep 20;14:1190326. doi: 10.3389/fpsyg.2023.1190326. eCollection 2023.
4
Automated Amharic News Categorization Using Deep Learning Models.基于深度学习模型的阿姆哈拉语新闻自动分类。
Comput Intell Neurosci. 2021 Jul 27;2021:3774607. doi: 10.1155/2021/3774607. eCollection 2021.
5
Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.追踪全球卫生共同财资金:使用自然语言处理技术的机器学习方法。
Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.
6
Development of a patients' satisfaction analysis system using machine learning and lexicon-based methods.基于机器学习和词典的方法开发患者满意度分析系统。
BMC Health Serv Res. 2023 Mar 23;23(1):280. doi: 10.1186/s12913-023-09260-7.
7
An Aggregated Mutual Information Based Feature Selection with Machine Learning Methods for Enhancing IoT Botnet Attack Detection.基于聚合互信息的特征选择与机器学习方法在增强物联网僵尸网络攻击检测中的应用。
Sensors (Basel). 2021 Dec 28;22(1):185. doi: 10.3390/s22010185.
8
Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis.用于增强阿拉伯语情感分析的异质集成深度学习模型。
Sensors (Basel). 2022 May 12;22(10):3707. doi: 10.3390/s22103707.
9
Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark.基于 Apache Spark 的混合机器学习预测慢性肾脏病。
Comput Intell Neurosci. 2022 Feb 23;2022:9898831. doi: 10.1155/2022/9898831. eCollection 2022.
10
Construction accident narrative classification: An evaluation of text mining techniques.建筑事故叙述分类:文本挖掘技术评估
Accid Anal Prev. 2017 Nov;108:122-130. doi: 10.1016/j.aap.2017.08.026. Epub 2017 Sep 1.

引用本文的文献

1
Sensors Data Processing Using Machine Learning.使用机器学习的传感器数据处理
Sensors (Basel). 2024 Mar 6;24(5):1694. doi: 10.3390/s24051694.
2
Pashto offensive language detection: a benchmark dataset and monolingual Pashto BERT.普什图语冒犯性语言检测:一个基准数据集和单语普什图语BERT
PeerJ Comput Sci. 2023 Oct 18;9:e1617. doi: 10.7717/peerj-cs.1617. eCollection 2023.
3
Detection of emotion by text analysis using machine learning.利用机器学习通过文本分析进行情感检测。

本文引用的文献

1
One Model is Not Enough: Ensembles for Isolated Sign Language Recognition.一个模型不够:孤立手语识别的集成。
Sensors (Basel). 2022 Jul 4;22(13):5043. doi: 10.3390/s22135043.
2
A Novel Detection and Multi-Classification Approach for IoT-Malware Using Random Forest Voting of Fine-Tuning Convolutional Neural Networks.基于卷积神经网络微调随机森林投票的物联网恶意软件新型检测与多分类方法。
Sensors (Basel). 2022 Jun 6;22(11):4302. doi: 10.3390/s22114302.
3
Comparison of Machine Learning and Sentiment Analysis in Detection of Suspicious Online Reviewers on Different Type of Data.
Front Psychol. 2023 Sep 20;14:1190326. doi: 10.3389/fpsyg.2023.1190326. eCollection 2023.
4
Deep Learning in the Detection of Disinformation about COVID-19 in Online Space.深度学习在在线空间中检测 COVID-19 虚假信息中的应用。
Sensors (Basel). 2022 Nov 30;22(23):9319. doi: 10.3390/s22239319.
机器学习和情感分析在不同类型数据中检测可疑在线评论者的比较。
Sensors (Basel). 2021 Dec 27;22(1):155. doi: 10.3390/s22010155.
4
Sentimental Analysis of COVID-19 Related Messages in Social Networks by Involving an N-Gram Stacked Autoencoder Integrated in an Ensemble Learning Scheme.社交媒体中与 COVID-19 相关信息的情感分析:一种集成 N 元堆叠自动编码器和集成学习方案的方法。
Sensors (Basel). 2021 Nov 15;21(22):7582. doi: 10.3390/s21227582.
5
Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks.使用双向递归神经网络改进的蛋白质赖氨酸巴豆酰化位点预测模型
J Proteome Res. 2022 Jan 7;21(1):265-273. doi: 10.1021/acs.jproteome.1c00848. Epub 2021 Nov 23.
6
An Approach to Integrating Sentiment Analysis into Recommender Systems.将情感分析集成到推荐系统中的方法。
Sensors (Basel). 2021 Aug 23;21(16):5666. doi: 10.3390/s21165666.