Suppr超能文献

基于真实与假新闻文章语言特征的多重变化点检测框架。

A Multiple change-point detection framework on linguistic characteristics of real versus fake news articles.

机构信息

Computer Science Department, University of Cyprus, Nicosia, Cyprus.

Department of Mathematics and Statistics, University of Cyprus, Nicosia, Cyprus.

出版信息

Sci Rep. 2023 Apr 13;13(1):6086. doi: 10.1038/s41598-023-32952-3.

Abstract

Extracting information from textual data of news articles has been proven to be significant in developing efficient fake news detection systems. Pointedly, to fight disinformation, researchers concentrated on extracting information which focuses on exploiting linguistic characteristics that are common in fake news and can aid in detecting false content automatically. Even though these approaches were proven to have high performance, the research community proved that both the language as well as the word use in literature are evolving. Therefore, the objective of this paper is to explore the linguistic characteristics of fake news and real ones over time. To achieve this, we establish a large dataset containing linguistic characteristics of various articles over the years. In addition, we introduce a novel framework where the articles are classified in specified topics based on their content and the most informative linguistic features are extracted using dimensionality reduction methods. Eventually, the framework detects the changes of the extracted linguistic features on real and fake news articles over the time incorporating a novel change-point detection method. By employing our framework for the established dataset, we noticed that the linguistic characteristics which concern the article's title seem to be significantly important in capturing important movements in the similarity level of "Fake" and "Real" articles.

摘要

从新闻文章的文本数据中提取信息已被证明在开发高效的假新闻检测系统方面具有重要意义。具体来说,为了打击虚假信息,研究人员专注于提取信息,重点利用在假新闻中常见的语言特征,以帮助自动检测虚假内容。尽管这些方法被证明具有很高的性能,但研究界证明,语言以及文献中的用词都在不断发展。因此,本文的目的是探讨假新闻和真新闻随时间推移的语言特征。为了实现这一目标,我们建立了一个包含多年来各种文章语言特征的大型数据集。此外,我们引入了一个新的框架,根据文章的内容将文章分类到指定的主题,并使用降维方法提取最具信息量的语言特征。最终,该框架通过采用一种新的变点检测方法,检测随时间变化的真实和假新闻文章中提取的语言特征的变化。通过在已建立的数据集上使用我们的框架,我们注意到,标题中的文章特征似乎在捕捉“假”和“真”文章相似性水平的重要变化方面非常重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f2d/10101974/63d13811be87/41598_2023_32952_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验