Wang Yuhang, Wang Li, Yang Yanjie, Lian Tao
Data Science College, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, China.
Expert Syst Appl. 2021 Mar 15;166:114090. doi: 10.1016/j.eswa.2020.114090. Epub 2020 Oct 3.
The wide spread of fake news has caused huge losses to both governments and the public. Many existing works on fake news detection utilized spreading information like propagators profiles and the propagation structure. However, such methods face the difficulty of data collection and cannot detect fake news at the early stage. An alternative approach is to detect fake news solely based on its content. Early content-based methods rely on manually designed linguistic features. Such shallow features are domain-dependent, and cannot easily be generalized to cross-domain data. Recently, many natural language processing tasks resort to deep learning methods to learn word, sentence, and document representations. In this paper, we propose a novel graph-based neural network model named SemSeq4FD for early fake news detection based on enhanced text representations. In SemSeq4FD, we model the global pair-wise semantic relations between sentences as a complete graph, and learn the global sentence representations via a graph convolutional network with self-attention mechanism. Considering the importance of local context in conveying the sentence meaning, we employ a 1D convolutional network to learn the local sentence representations. The two representations are combined to form the enhanced sentence representations. Then a LSTM-based network is used to model the sequence of enhanced sentence representations, yielding the final document representation for fake news detection. Experiments conducted on four real-world datasets in English and Chinese, including cross-source and cross-domain datasets, demonstrate that our model can outperform the state-of-the-art methods.
虚假新闻的广泛传播给政府和公众都造成了巨大损失。许多现有的虚假新闻检测工作利用传播者简介和传播结构等传播信息。然而,这些方法面临数据收集的困难,并且无法在早期阶段检测到虚假新闻。另一种方法是仅基于虚假新闻的内容进行检测。早期基于内容的方法依赖于人工设计的语言特征。这种浅层特征依赖于领域,并且不容易推广到跨领域数据。最近,许多自然语言处理任务采用深度学习方法来学习单词、句子和文档表示。在本文中,我们提出了一种名为SemSeq4FD的基于图的新型神经网络模型,用于基于增强文本表示的早期虚假新闻检测。在SemSeq4FD中,我们将句子之间的全局成对语义关系建模为一个完全图,并通过具有自注意力机制的图卷积网络学习全局句子表示。考虑到局部上下文在传达句子含义中的重要性,我们采用一维卷积网络来学习局部句子表示。将这两种表示结合起来形成增强的句子表示。然后使用基于长短期记忆网络(LSTM)的网络对增强句子表示的序列进行建模,生成用于虚假新闻检测的最终文档表示。在包括跨源和跨领域数据集在内的四个英文和中文真实世界数据集上进行的实验表明,我们的模型优于现有最先进的方法。