Suppr超能文献

HAPI:一种基于高效混合特征工程的社交媒体中宣传识别方法。

HAPI: An efficient Hybrid Feature Engineering-based Approach for Propaganda Identification in social media.

机构信息

Dept. of Computer Sciences & Software Engineering-CIT, United Arab Emirates University, Al Ain, United Arab Emirates.

EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia.

出版信息

PLoS One. 2024 Jul 10;19(7):e0302583. doi: 10.1371/journal.pone.0302583. eCollection 2024.

Abstract

Social media platforms serve as communication tools where users freely share information regardless of its accuracy. Propaganda on these platforms refers to the dissemination of biased or deceptive information aimed at influencing public opinion, encompassing various forms such as political campaigns, fake news, and conspiracy theories. This study introduces a Hybrid Feature Engineering Approach for Propaganda Identification (HAPI), designed to detect propaganda in text-based content like news articles and social media posts. HAPI combines conventional feature engineering methods with machine learning techniques to achieve high accuracy in propaganda detection. This study is conducted on data collected from Twitter via its API, and an annotation scheme is proposed to categorize tweets into binary classes (propaganda and non-propaganda). Hybrid feature engineering entails the amalgamation of various features, including Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), Sentimental features, and tweet length, among others. Multiple Machine Learning classifiers undergo training and evaluation utilizing the proposed methodology, leveraging a selection of 40 pertinent features identified through the hybrid feature selection technique. All the selected algorithms including Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Decision Tree (DT), and Logistic Regression (LR) achieved promising results. The SVM-based HaPi (SVM-HaPi) exhibits superior performance among traditional algorithms, achieving precision, recall, F-Measure, and overall accuracy of 0.69, 0.69, 0.69, and 69.2%, respectively. Furthermore, the proposed approach is compared to well-known existing approaches where it overperformed most of the studies on several evaluation metrics. This research contributes to the development of a comprehensive system tailored for propaganda identification in textual content. Nonetheless, the purview of propaganda detection transcends textual data alone. Deep learning algorithms like Artificial Neural Networks (ANN) offer the capability to manage multimodal data, incorporating text, images, audio, and video, thereby considering not only the content itself but also its presentation and contextual nuances during dissemination.

摘要

社交媒体平台充当着用户自由分享信息的交流工具,而这些信息的准确性却不受限制。这些平台上的宣传是指传播有偏见或欺骗性的信息,旨在影响公众舆论,其形式包括政治运动、假新闻和阴谋论等。本研究提出了一种用于识别宣传的混合特征工程方法(HAPI),旨在检测新闻文章和社交媒体帖子等基于文本的内容中的宣传。HAPI 将传统的特征工程方法与机器学习技术相结合,以实现宣传检测的高精度。本研究是在通过其 API 从 Twitter 收集的数据上进行的,并提出了一种注释方案,将推文分为两类(宣传和非宣传)。混合特征工程涉及到各种特征的融合,包括词频逆文档频率(TF-IDF)、词袋(BoW)、情感特征和推文长度等。通过使用混合特征选择技术选择的 40 个相关特征,利用所提出的方法对多种机器学习分类器进行训练和评估。包括多项式朴素贝叶斯(MNB)、支持向量机(SVM)、决策树(DT)和逻辑回归(LR)在内的所有选定算法都取得了良好的结果。基于 SVM 的 HAPI(SVM-HaPi)在传统算法中表现出色,其精度、召回率、F-Measure 和总体准确率分别为 0.69、0.69、0.69 和 69.2%。此外,与知名的现有方法相比,该方法在多个评估指标上的表现优于大多数研究。本研究为开发用于识别文本内容中宣传的综合系统做出了贡献。然而,宣传检测的范围不仅仅局限于文本数据。人工神经网络(ANN)等深度学习算法具有处理多模态数据的能力,可整合文本、图像、音频和视频,从而不仅考虑内容本身,还考虑其在传播过程中的呈现和上下文细微差别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ae/11236156/098992e70dd5/pone.0302583.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验