Dixit Dheeraj Kumar, Bhagat Amit, Dangi Dharmendra
Department of Computer Applications, MANIT, Bhopal, India.
Soft comput. 2022;26(22):12545-12557. doi: 10.1007/s00500-022-07215-4. Epub 2022 Jun 16.
In recent years, rumours and fake news are spreading widely and very rapidly all over the world. Such circumstances lead to the propagation and production of an inaccurate news article. Also, misinformation and fake news are increased by the user without proper verification. Hence, it is necessary to restrict the spreading of fake information on mass media and to promote confidence all over the world. For this purpose, this paper recognizes the detection of fake news in an effective manner. The proposed methodology in detecting fake news consists of four different phases namely the data pre-processing phase, feature reduction phase, feature extraction phase as well as the classification phase. During data pre-processing, the input data are pre-processed by employing tokenization, stop-words deletion as well as stemming. In the second phase, the features are reduced by employing PPCA to enhance accuracy. Then the extracted feature is provided to the classification phase where LSTM-LF algorithm is utilized to classify the news as fake or real optimally. Furthermore, this paper utilizes four different datasets namely the Buzzfeed dataset, GossipCop dataset, ISOT dataset as well as Politifact dataset for evaluation. The performance evaluation and the comparative analysis are conducted and the analysis reveals that the proposed approach provides better performances when compared to other fake detection-based approaches.
近年来,谣言和假新闻在全球范围内广泛且迅速地传播。这种情况导致了不准确新闻文章的传播和产生。此外,未经适当核实,用户传播的错误信息和假新闻也在增加。因此,有必要限制假信息在大众媒体上的传播,并在全球范围内增强信心。为此,本文致力于有效地检测假新闻。所提出的检测假新闻的方法包括四个不同阶段,即数据预处理阶段、特征约简阶段、特征提取阶段以及分类阶段。在数据预处理过程中,通过采用词法分析、停用词删除以及词干提取对输入数据进行预处理。在第二阶段,采用概率主成分分析(PPCA)来减少特征以提高准确性。然后将提取的特征提供给分类阶段,在该阶段利用长短期记忆网络 - 局部特征(LSTM - LF)算法对新闻进行最优分类,判断其为假新闻还是真实新闻。此外,本文利用四个不同的数据集,即Buzzfeed数据集、GossipCop数据集、ISOT数据集以及Politifact数据集进行评估。进行了性能评估和对比分析,分析表明与其他基于假新闻检测的方法相比,所提出的方法具有更好的性能。