Suppr超能文献

一种基于网络的用于虚假新闻检测的正例与无标签学习方法。

A network-based positive and unlabeled learning approach for fake news detection.

作者信息

de Souza Mariana Caravanti, Nogueira Bruno Magalhães, Rossi Rafael Geraldeli, Marcacini Ricardo Marcondes, Dos Santos Brucce Neves, Rezende Solange Oliveira

机构信息

ICMC-USP, São Carlos, 13566-590 Brazil.

FACOM-UFMS, Campo Grande, 79070-900 Brazil.

出版信息

Mach Learn. 2022;111(10):3549-3592. doi: 10.1007/s10994-021-06111-6. Epub 2021 Nov 18.

Abstract

Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (-Means, -Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network.

摘要

假新闻能够迅速在互联网用户中传播,并能欺骗大量受众。由于这些特性,它们会对政治和经济事件产生直接影响。机器学习方法已被用于辅助假新闻识别。然而,由于真实新闻的范围广泛、难以特征化,且由于更新频率高导致数据标注成本高昂,单类学习(OCL)和正例与无标注学习(PUL)作为一种有趣的方法出现,用于基于内容的假新闻检测,与传统机器学习技术相比,它使用的标注数据较少。特别是基于网络的方法适用于假新闻检测,因为它们允许将来自出版物不同方面的信息纳入问题建模。在本文中,我们提出一种基于标签传播的正例与无标注学习(PU-LP)的基于网络的方法,这是一种单类和转导半监督学习算法,通过首先将潜在感兴趣和不感兴趣的文档识别到无标注数据中,然后传播标签来对其余无标注文档进行分类。然后采用标签传播方法对其余无标注文档进行分类。我们考虑了同构(仅文档)和异构(文档和术语)网络来评估我们提议方法的性能。我们的比较分析考虑了在单类文本分类中广泛使用的四种OCL算法(K-Means、基于密度的K近邻、单类支持向量机和密集自动编码器),以及另一种传统的PUL算法(罗基奥支持向量机)。在三种新闻集合中对这些算法进行了评估,考虑了平衡和极度不平衡的情况。我们使用词袋模型和Doc2Vec模型将新闻转换为结构化数据。结果表明,在大多数情况下,PU-LP方法比其他PUL和OCL方法更稳定且取得更好的结果,其性能与半监督二元算法相似。此外,在新闻网络中纳入术语能产生更好的结果,特别是当新闻根据真实性和主题分布在特征空间中时。对于基于向量空间模型和文档相似性网络的两种算法,使用Doc2Vec进行新闻表示比词袋模型取得了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbea/8601374/49bb46f2e91b/10994_2021_6111_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验