School of Humanities and Communication, Zhejiang Gongshang University, Hangzhou, China.
School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, China.
PLoS One. 2024 Oct 24;19(10):e0312240. doi: 10.1371/journal.pone.0312240. eCollection 2024.
Fake news detection is growing in importance as a key topic in the information age. However, most current methods rely on pre-trained small language models (SLMs), which face significant limitations in processing news content that requires specialized knowledge, thereby constraining the efficiency of fake news detection. To address these limitations, we propose the FND-LLM Framework, which effectively combines SLMs and LLMs to enhance their complementary strengths and explore the capabilities of LLMs in multimodal fake news detection. The FND-LLM framework integrates the textual feature branch, the visual semantic branch, the visual tampering branch, the co-attention network, the cross-modal feature branch and the large language model branch. The textual feature branch and visual semantic branch are responsible for extracting the textual and visual information of the news content, respectively, while the co-attention network is used to refine the interrelationship between the textual and visual information. The visual tampering branch is responsible for extracting news image tampering features. The cross-modal feature branch enhances inter-modal complementarity through the CLIP model, while the large language model branch utilizes the inference capability of LLMs to provide auxiliary explanation for the detection process. Our experimental results indicate that the FND-LLM framework outperforms existing models, achieving improvements of 0.7%, 6.8% and 1.3% improvements in overall accuracy on Weibo, Gossipcop, and Politifact, respectively.
虚假新闻检测作为信息时代的一个重要课题,其重要性日益凸显。然而,目前大多数方法都依赖于预先训练的小语言模型 (SLM),这些模型在处理需要专业知识的新闻内容时面临着重大的局限性,从而限制了虚假新闻检测的效率。为了解决这些局限性,我们提出了 FND-LLM 框架,该框架有效地结合了 SLM 和 LLM,以增强它们的互补优势,并探索 LLM 在多模态虚假新闻检测中的能力。FND-LLM 框架集成了文本特征分支、视觉语义分支、视觉篡改分支、协同注意网络、跨模态特征分支和大语言模型分支。文本特征分支和视觉语义分支分别负责提取新闻内容的文本和视觉信息,而协同注意网络用于细化文本和视觉信息之间的相互关系。视觉篡改分支负责提取新闻图像篡改特征。跨模态特征分支通过 CLIP 模型增强了模态间的互补性,而大语言模型分支则利用 LLM 的推理能力,为检测过程提供辅助解释。我们的实验结果表明,FND-LLM 框架优于现有模型,在微博、八卦侦探和政治事实上的整体准确率分别提高了 0.7%、6.8%和 1.3%。