Abdelminaam Diaa Salama, Ismail Fatma Helmy, Taha Mohamed, Taha Ahmed, Houssein Essam H, Nabil Ayman
Faculty of Computers and Artificial IntelligenceBenha University Benha 13511 Egypt.
Faculty of Computer ScienceMisr International University Cairo 11341 Egypt.
IEEE Access. 2021 Feb 9;9:27840-27867. doi: 10.1109/ACCESS.2021.3058066. eCollection 2021.
COVID-19 has affected all peoples' lives. Though COVID-19 is on the rising, the existence of misinformation about the virus also grows in parallel. Additionally, the spread of misinformation has created confusion among people, caused disturbances in society, and even led to deaths. Social media is central to our daily lives. The Internet has become a significant source of knowledge. Owing to the widespread damage caused by fake news, it is important to build computerized systems to detect fake news. The paper proposes an updated deep neural network for identification of false news. The deep learning techniques are The Modified-LSTM (one to three layers) and The Modified GRU (one to three layers). In particular, we carry out investigations of a large dataset of tweets passing on data with respect to COVID-19. In our study, we separate the dubious claims into two categories: true and false. We compare the performance of the various algorithms in terms of prediction accuracy. The six machine learning techniques are decision trees, logistic regression, k nearest neighbors, random forests, support vector machines, and naïve Bayes (NB). The parameters of deep learning techniques are optimized using Keras-tuner. Four Benchmark datasets were used. Two feature extraction methods were used (TF-ID with N-gram) to extract essential features from the four benchmark datasets for the baseline machine learning model and word embedding feature extraction method for the proposed deep neural network methods. The results obtained with the proposed framework reveal high accuracy in detecting Fake and non-Fake tweets containing COVID-19 information. These results demonstrate significant improvement as compared to the existing state of art results of baseline machine learning models. In our approach, we classify the data into two categories: fake or nonfake. We compare the execution of the proposed approaches with Six machine learning procedures. The six machine learning procedures are Decision Tree (DT), Logistic Regression (LR), K Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB). The parameters of deep learning techniques are optimized using Keras-tuner. Four Benchmark datasets were used. Two feature extraction methods were used (TF-ID with N-gram) to extract essential features from the four benchmark datasets for the baseline machine learning model and word embedding feature extraction method for the proposed deep neural network methods. The results obtained with the proposed framework reveal high accuracy in detecting Fake and non-Fake tweets containing COVID-19 information. These results demonstrate significant improvement as compared to the existing state of art results of baseline machine learning models.
新冠疫情影响了所有人的生活。尽管新冠疫情呈上升趋势,但关于该病毒的错误信息也在同步增加。此外,错误信息的传播在人们中造成了困惑,引发了社会动荡,甚至导致了死亡。社交媒体是我们日常生活的核心。互联网已成为重要的知识来源。由于虚假新闻造成的广泛破坏,构建检测虚假新闻的计算机系统很重要。本文提出了一种用于识别虚假新闻的更新的深度神经网络。深度学习技术包括改进的长短期记忆网络(一到三层)和改进的门控循环单元(一到三层)。特别是,我们对大量传播新冠疫情相关数据的推文数据集进行了调查研究。在我们的研究中,我们将可疑声明分为两类:真和假。我们在预测准确性方面比较了各种算法的性能。六种机器学习技术分别是决策树、逻辑回归、k近邻、随机森林、支持向量机和朴素贝叶斯(NB)。深度学习技术的参数使用Keras调优器进行了优化。使用了四个基准数据集。使用了两种特征提取方法(带N-gram的词频-逆文档频率)从四个基准数据集中提取基本特征,用于基线机器学习模型,以及用于所提出的深度神经网络方法的词嵌入特征提取方法。所提出框架获得的结果表明,在检测包含新冠疫情信息的虚假和非虚假推文方面具有很高的准确性。与基线机器学习模型的现有技术水平结果相比,这些结果显示出显著的改进。在我们的方法中,我们将数据分为两类:假或非假。我们将所提出方法的执行情况与六种机器学习程序进行了比较。这六种机器学习程序分别是决策树(DT)、逻辑回归(LR)、k近邻(KNN)、随机森林(RF)、支持向量机(SVM)和朴素贝叶斯(NB)。深度学习技术的参数使用Keras调优器进行了优化。使用了四个基准数据集。使用了两种特征提取方法(带N-gram的词频-逆文档频率)从四个基准数据集中提取基本特征,用于基线机器学习模型,以及用于所提出的深度神经网络方法的词嵌入特征提取方法。所提出框架获得的结果表明,在检测包含新冠疫情信息的虚假和非虚假推文方面具有很高的准确性。与基线机器学习模型的现有技术水平结果相比,这些结果显示出显著的改进。