用于新冠疫情虚假新闻分类的时空方法

Spatio-temporal approach for classification of COVID-19 pandemic fake news.

作者信息

Agarwal I Y, Rana D P, Shaikh M, Poudel S

机构信息

Sardar Vallabhbhai National Institute of Technology, Surat, 395007 Gujarat India.

出版信息

Soc Netw Anal Min. 2022;12(1):68. doi: 10.1007/s13278-022-00887-8. Epub 2022 Jun 27.

DOI:10.1007/s13278-022-00887-8

PMID:35789891

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9244012/

Abstract

The spread of Fake News during this global pandemic COVID-19 has dangerous consequences on economy and health of public. From origin of virus, spread, self-medication to hoaxes on vaccination, it created more panic than the fatality of the virus. For better infodemic preparedness and control, it is necessary to mitigate fear among people, manage rumours, and dispel misinformation. A survey on Fake News during COVID-19 was made by Poynter Fact Check institute. It stated that major chunk of the fake news on COVID-19 originated majorly in Brazil, India, Spain, and the United States. Fake news menace is severe in countries where the trust on online media is high such as Brazil, Kenya and South Africa. Based on these observations, this study provides preliminary insight on the co-relation of the spatial and temporal meta-information of the news like the news source country, the name of the countries specified in the news, and date of publish of news to the credibility of news. The main contribution of this study is to analyse the impact of spatial and temporal information features for classification of fake news, which to the best of our knowledge has not been explored yet. Also, these features are directly not available in any news article available online. Hence, these features are handcrafted. Meta-data of the news article such as origin of news is considered. Additional spatial information is extracted from the news article using NER tagging. Temporal information such as date of origin of news is given as an input to the LSTM model. These features are given as an input to Long Short-Term Memory (LSTM) model along with GloVe vectors and word length vector. A comparative analysis for accuracy is tested of the models with and without spatial and temporal information. The model with spatial and temporal information has achieved noteworthy results in fake news detection. To ensure the quality of prediction, various model parameters have been tuned and recorded for the best results possible. In addition to accuracy, the spatial and temporal information for fake news detection offers several other important implications for government and policy makers that will be instrumental in simulating future research on this subject.

摘要

在此次全球新冠疫情期间，假新闻的传播给公众的经济和健康带来了危险后果。从病毒起源、传播、自我用药到疫苗接种骗局，它造成的恐慌比病毒致死人数还多。为了更好地做好信息疫情的防范和控制，有必要减轻人们的恐惧、管理谣言并消除错误信息。波因特事实核查机构对新冠疫情期间的假新闻进行了一项调查。调查指出，关于新冠疫情的大部分假新闻主要起源于巴西、印度、西班牙和美国。在对在线媒体信任度较高的国家，如巴西、肯尼亚和南非，假新闻的威胁尤为严重。基于这些观察结果，本研究初步洞察了新闻的时空元信息（如新闻来源国、新闻中提及的国家名称以及新闻发布日期）与新闻可信度之间的相关性。本研究的主要贡献在于分析时空信息特征对假新闻分类的影响，据我们所知，这一点尚未得到探索。此外，这些特征在任何在线新闻文章中都无法直接获取。因此，这些特征是人工构建的。考虑了新闻文章的元数据，如新闻来源。使用命名实体识别（NER）标记从新闻文章中提取额外的空间信息。将新闻起源日期等时间信息作为输入提供给长短期记忆（LSTM）模型。这些特征与GloVe向量和单词长度向量一起作为输入提供给长短期记忆（LSTM）模型。对有无时空信息的模型进行了准确率的对比分析。具有时空信息的模型在假新闻检测中取得了显著成果。为确保预测质量，对各种模型参数进行了调整并记录下来以获得最佳结果。除了准确率之外，用于假新闻检测的时空信息对政府和政策制定者还有其他几个重要意义，这将有助于推动该主题的未来研究。