一种基于网络的用于虚假新闻检测的正例与无标签学习方法。

A network-based positive and unlabeled learning approach for fake news detection.

作者信息

de Souza Mariana Caravanti, Nogueira Bruno Magalhães, Rossi Rafael Geraldeli, Marcacini Ricardo Marcondes, Dos Santos Brucce Neves, Rezende Solange Oliveira

机构信息

ICMC-USP, São Carlos, 13566-590 Brazil.

FACOM-UFMS, Campo Grande, 79070-900 Brazil.

出版信息

Mach Learn. 2022;111(10):3549-3592. doi: 10.1007/s10994-021-06111-6. Epub 2021 Nov 18.

DOI:10.1007/s10994-021-06111-6

PMID:34815619

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8601374/

Abstract

Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (-Means, -Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network.

摘要

假新闻能够迅速在互联网用户中传播，并能欺骗大量受众。由于这些特性，它们会对政治和经济事件产生直接影响。机器学习方法已被用于辅助假新闻识别。然而，由于真实新闻的范围广泛、难以特征化，且由于更新频率高导致数据标注成本高昂，单类学习（OCL）和正例与无标注学习（PUL）作为一种有趣的方法出现，用于基于内容的假新闻检测，与传统机器学习技术相比，它使用的标注数据较少。特别是基于网络的方法适用于假新闻检测，因为它们允许将来自出版物不同方面的信息纳入问题建模。在本文中，我们提出一种基于标签传播的正例与无标注学习（PU-LP）的基于网络的方法，这是一种单类和转导半监督学习算法，通过首先将潜在感兴趣和不感兴趣的文档识别到无标注数据中，然后传播标签来对其余无标注文档进行分类。然后采用标签传播方法对其余无标注文档进行分类。我们考虑了同构（仅文档）和异构（文档和术语）网络来评估我们提议方法的性能。我们的比较分析考虑了在单类文本分类中广泛使用的四种OCL算法（K-Means、基于密度的K近邻、单类支持向量机和密集自动编码器），以及另一种传统的PUL算法（罗基奥支持向量机）。在三种新闻集合中对这些算法进行了评估，考虑了平衡和极度不平衡的情况。我们使用词袋模型和Doc2Vec模型将新闻转换为结构化数据。结果表明，在大多数情况下，PU-LP方法比其他PUL和OCL方法更稳定且取得更好的结果，其性能与半监督二元算法相似。此外，在新闻网络中纳入术语能产生更好的结果，特别是当新闻根据真实性和主题分布在特征空间中时。对于基于向量空间模型和文档相似性网络的两种算法，使用Doc2Vec进行新闻表示比词袋模型取得了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbea/8601374/49bb46f2e91b/10994_2021_6111_Fig1_HTML.jpg

相似文献

A network-based positive and unlabeled learning approach for fake news detection.一种基于网络的用于虚假新闻检测的正例与无标签学习方法。

Mach Learn. 2022;111(10):3549-3592. doi: 10.1007/s10994-021-06111-6. Epub 2021 Nov 18.

A veracity dissemination consistency-based few-shot fake news detection framework by synergizing adversarial and contrastive self-supervised learning.一种基于真实性传播一致性的少样本假新闻检测框架，通过协同对抗性和对比性自监督学习实现。

Sci Rep. 2024 Aug 22;14(1):19470. doi: 10.1038/s41598-024-70039-9.

Deep Ensemble Fake News Detection Model Using Sequential Deep Learning Technique.基于序列深度学习技术的深度集成假新闻检测模型。

Sensors (Basel). 2022 Sep 15;22(18):6970. doi: 10.3390/s22186970.

Detection of Turkish Fake News in Twitter with Machine Learning Algorithms.使用机器学习算法在推特上检测土耳其假新闻

Arab J Sci Eng. 2022;47(2):2359-2379. doi: 10.1007/s13369-021-06223-0. Epub 2021 Oct 1.

Detection of Fake News Text Classification on COVID-19 Using Deep Learning Approaches.基于深度学习方法的 COVID-19 假新闻文本分类检测。

Comput Math Methods Med. 2021 Nov 15;2021:5514220. doi: 10.1155/2021/5514220. eCollection 2021.

CoAID-DEEP: An Optimized Intelligent Framework for Automated Detecting COVID-19 Misleading Information on Twitter.CoAID-DEEP：用于自动检测推特上新冠病毒误导性信息的优化智能框架

IEEE Access. 2021 Feb 9;9:27840-27867. doi: 10.1109/ACCESS.2021.3058066. eCollection 2021.

Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.利用排列检验评估正无标签学习在高维生物学数据集上的置信度。

BMC Bioinformatics. 2024 Jun 19;25(1):218. doi: 10.1186/s12859-024-05834-2.

Multi-class motor imagery EEG classification using collaborative representation-based semi-supervised extreme learning machine.基于协同表示的半监督极限学习机的多类运动想象 EEG 分类。

Med Biol Eng Comput. 2020 Sep;58(9):2119-2130. doi: 10.1007/s11517-020-02227-4. Epub 2020 Jul 16.

Fake news detection in Urdu language using machine learning.使用机器学习进行乌尔都语假新闻检测。

PeerJ Comput Sci. 2023 May 23;9:e1353. doi: 10.7717/peerj-cs.1353. eCollection 2023.

Intra-graph and Inter-graph joint information propagation network with third-order text graph tensor for fake news detection.基于三阶文本图张量的图内与图间联合信息传播网络用于假新闻检测

Appl Intell (Dordr). 2023 Feb 15:1-18. doi: 10.1007/s10489-023-04455-1.

引用本文的文献

A review of semi-supervised learning for text classification.文本分类的半监督学习综述。

Artif Intell Rev. 2023 Jan 31:1-69. doi: 10.1007/s10462-023-10393-8.

A systematic literature review and existing challenges toward fake news detection models.关于假新闻检测模型的系统文献综述及现存挑战。

Soc Netw Anal Min. 2022;12(1):168. doi: 10.1007/s13278-022-00995-5. Epub 2022 Nov 14.

本文引用的文献

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark.异构网络表示学习：一个包含综述与基准测试的统一框架

IEEE Trans Knowl Data Eng. 2022 Oct;34(10):4854-4873. doi: 10.1109/tkde.2020.3045924. Epub 2020 Dec 21.

Target specific mining of COVID-19 scholarly articles using one-class approach.使用单类方法对新冠病毒学术文章进行目标特定挖掘。

Chaos Solitons Fractals. 2020 Nov;140:110155. doi: 10.1016/j.chaos.2020.110155. Epub 2020 Jul 30.

FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media.假新闻网：一个具有新闻内容、社交背景和时空信息的数据资源库，用于研究社交媒体上的假新闻。

Big Data. 2020 Jun;8(3):171-188. doi: 10.1089/big.2020.0062.

#FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media.#FluxFlow：社交媒体上异常信息传播的可视化分析

IEEE Trans Vis Comput Graph. 2014 Dec;20(12):1773-82. doi: 10.1109/TVCG.2014.2346922.

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Oct;80(4 Pt 2):046122. doi: 10.1103/PhysRevE.80.046122. Epub 2009 Oct 26.

The accuracy-confidence correlation in the detection of deception.欺骗检测中的准确性与置信度相关性。

Pers Soc Psychol Rev. 1997;1(4):346-57. doi: 10.1207/s15327957pspr0104_5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于网络的用于虚假新闻检测的正例与无标签学习方法。

A network-based positive and unlabeled learning approach for fake news detection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献