Computer Science Department, University of Cyprus, Nicosia, Cyprus.
Department of Mathematics and Statistics, University of Cyprus, Nicosia, Cyprus.
Sci Rep. 2023 Apr 13;13(1):6086. doi: 10.1038/s41598-023-32952-3.
Extracting information from textual data of news articles has been proven to be significant in developing efficient fake news detection systems. Pointedly, to fight disinformation, researchers concentrated on extracting information which focuses on exploiting linguistic characteristics that are common in fake news and can aid in detecting false content automatically. Even though these approaches were proven to have high performance, the research community proved that both the language as well as the word use in literature are evolving. Therefore, the objective of this paper is to explore the linguistic characteristics of fake news and real ones over time. To achieve this, we establish a large dataset containing linguistic characteristics of various articles over the years. In addition, we introduce a novel framework where the articles are classified in specified topics based on their content and the most informative linguistic features are extracted using dimensionality reduction methods. Eventually, the framework detects the changes of the extracted linguistic features on real and fake news articles over the time incorporating a novel change-point detection method. By employing our framework for the established dataset, we noticed that the linguistic characteristics which concern the article's title seem to be significantly important in capturing important movements in the similarity level of "Fake" and "Real" articles.
从新闻文章的文本数据中提取信息已被证明在开发高效的假新闻检测系统方面具有重要意义。具体来说,为了打击虚假信息,研究人员专注于提取信息,重点利用在假新闻中常见的语言特征,以帮助自动检测虚假内容。尽管这些方法被证明具有很高的性能,但研究界证明,语言以及文献中的用词都在不断发展。因此,本文的目的是探讨假新闻和真新闻随时间推移的语言特征。为了实现这一目标,我们建立了一个包含多年来各种文章语言特征的大型数据集。此外,我们引入了一个新的框架,根据文章的内容将文章分类到指定的主题,并使用降维方法提取最具信息量的语言特征。最终,该框架通过采用一种新的变点检测方法,检测随时间变化的真实和假新闻文章中提取的语言特征的变化。通过在已建立的数据集上使用我们的框架,我们注意到,标题中的文章特征似乎在捕捉“假”和“真”文章相似性水平的重要变化方面非常重要。