Ali Ghulam, Malik Muhammad Shahid Iqbal
Department of Computer Science, COMSATS University Islamabad, Attock Campus, Islamabad, Pakistan.
Department of Computer Science, Capital University of Science and Technology, Kahuta Road, Sihala, 44000 Islamabad, Pakistan.
Multimed Tools Appl. 2023;82(5):7017-7038. doi: 10.1007/s11042-022-13595-4. Epub 2022 Aug 12.
Social microblogs are one of the popular platforms for information spreading. However, with several advantages, these platforms are being used for spreading rumours. At present, the majority of existing approaches identify rumours at the topic level instead of at the tweet/post level. Moreover, prior studies used the sentiment and linguistic features for rumours identification without considering discrete positive and negative emotions and effective part-of-speech features in content-based approaches. Similarly, the majority of prior studies used content-based approaches for feature generation, and recent context-based approaches were not explored. To cope with these challenges, a robust framework for rumour detection at the tweet level is designed in this paper. The model used word2vec embeddings and bidirectional encoder representations from transformers method (BERT) from context-based and discrete emotions, linguistic, and metadata characteristics from content-based approaches. According to our knowledge, we are the first ones who used these features for rumour identification at the tweet/post level. The framework is tested on four real-life twitter microblog datasets. The results show that the detection model is capable of detecting 97%, 86%, 85%, and 80% of rumours on four datasets respectively. In addition, the proposed framework outperformed the three latest state-of-the-art baselines. BERT model presented the best performance among context-based approaches, and linguistic features are best performing among content-based approaches as a stand-alone model. Moreover, the utilization of two-step feature selection further improves the detection model performance.
社交微博是信息传播的热门平台之一。然而,尽管有诸多优点,这些平台却被用于传播谣言。目前,大多数现有方法是在话题层面而非推文/帖子层面识别谣言。此外,先前的研究在基于内容的方法中使用情感和语言特征进行谣言识别时,没有考虑离散的积极和消极情绪以及有效的词性特征。同样,大多数先前的研究使用基于内容的方法进行特征生成,而未探索最近基于上下文的方法。为应对这些挑战,本文设计了一个用于在推文层面进行谣言检测的强大框架。该模型使用了基于上下文的离散情绪、语言和基于内容的方法中的元数据特征的词向量嵌入和来自变换器方法(BERT)的双向编码器表示。据我们所知,我们是首批在推文/帖子层面使用这些特征进行谣言识别的人。该框架在四个真实的推特微博数据集上进行了测试。结果表明,该检测模型在四个数据集上分别能够检测出97%、86%、85%和80%的谣言。此外,所提出的框架优于三个最新的最先进基线。在基于上下文的方法中,BERT模型表现最佳,而在基于内容的方法中,语言特征作为独立模型表现最佳。此外,两步特征选择的运用进一步提高了检测模型的性能。