Alqadi Basma S, Alsuhibany Suliman A, Yousafzai Samia Nawaz, Alzu'bi Sharf, Alsekait Deema Mohammed, AbdElminaam Diaa Salama
Computer Science Department,College of Computer and Information Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia.
Department of Computer Science,College of Computer, Qassim University, 51452, Buraydah, Saudi Arabia.
Sci Rep. 2025 Aug 5;15(1):28490. doi: 10.1038/s41598-025-10670-2.
Today, the problem of using social media to spread false information is not only widespread but also quite serious. The extensive dissemination of fake news, regardless of whether it is produced by human beings or computer programs, has a negative impact not only on society but also on individuals in terms of politics and society. Currently of social networks, the quick dissemination of news provides a challenge when it comes to establishing the reliability of the information in a satisfactory manner. Because of this, the requirement for automated technologies that can identify fake news has become of the utmost importance. Existing fake news detection methods often suffer from challenges such as limited labeled data, inability to fully capture complex linguistic nuances, and inadequate integration of different embedding techniques, which restrict their effectiveness and generalizability. In this work, we propose a novel multi-stage transfer learning framework that leverages the strengths of pre-trained large language models, particularly RoBERTa, tailored specifically for fake news detection in limited data scenarios. Unlike prior studies which primarily rely on standard fine-tuning, our approach introduces a systematic comparison of word embedding techniques such as Word2Vec and one-hot encoding, combined with a refined fine-tuning process to enhance model performance and interpretability. The experimental results on two real-world benchmark datasets demonstrate that our method achieves a significant accuracy improvement of at least 3.9% over existing state-of-the-art models, while also providing insights into the role of embedding techniques in fake news classification. To address these limitations, our approach fills the gap by combining multi-stage transfer learning with embedding comparisons and task-specific optimizations, enabling more robust and accurate detection on small datasets. Based on the findings of our experiments conducted on two datasets derived from the real world, we have determined that the transfer learning-based strategy that we have developed can outperform the most advanced approaches by a minimum of 3.9% in terms of accuracy and offering a rational explanation.
如今,利用社交媒体传播虚假信息的问题不仅普遍存在,而且相当严重。假新闻的广泛传播,无论其是由人类还是计算机程序制造,在政治和社会层面上不仅对社会,而且对个人都产生了负面影响。在当前的社交网络中,新闻的快速传播给以令人满意的方式确定信息的可靠性带来了挑战。因此,对能够识别假新闻的自动化技术的需求变得至关重要。现有的假新闻检测方法常常面临诸如标记数据有限、无法充分捕捉复杂的语言细微差别以及不同嵌入技术整合不足等挑战,这些限制了它们的有效性和通用性。在这项工作中,我们提出了一种新颖的多阶段迁移学习框架,该框架利用预训练的大型语言模型(特别是RoBERTa)的优势,专门针对有限数据场景中的假新闻检测进行了定制。与主要依赖标准微调的先前研究不同,我们的方法引入了对诸如Word2Vec和独热编码等词嵌入技术的系统比较,并结合了精细的微调过程,以提高模型性能和可解释性。在两个真实世界基准数据集上的实验结果表明,我们的方法比现有的最先进模型实现了至少3.9%的显著准确率提升,同时还深入了解了嵌入技术在假新闻分类中的作用。为了解决这些限制,我们的方法通过将多阶段迁移学习与嵌入比较和特定任务优化相结合来填补空白,从而在小数据集上实现更强大、准确的检测。基于我们在两个来自真实世界的数据集上进行的实验结果,我们确定我们开发的基于迁移学习的策略在准确率方面至少比最先进的方法高出3.9%,并提供了合理的解释。