Zhang Jinjin, Pan Yang, Lin Han, Sun Zhoubao, Wu Pingping, Tu Juan
School of Computer Science, Nanjing Audit University, Nanjing, China.
School of Engineering Audit, Jiangsu Key Laboratory of Public Project Audit, Nanjing Audit University, Nanjing, China.
Arch Public Health. 2023 Sep 7;81(1):166. doi: 10.1186/s13690-023-01179-z.
The Coronavirus Disease 2019 (COVID-19) pandemic was a huge shock to society, and the ensuing information problems had a huge impact on society at the same time. The urgent need to understand the Infodemic, i.e., the importance of the spread of false information related to the epidemic, has been highlighted. However, while there is a growing interest in this phenomenon, studies on the topic discovery, data collection, and data preparation phases of the information analysis process have been lacking.
Since the epidemic is unprecedented and has not ended to this day, we aimed to examine the existing Infodemic-related literature from January 2019 to December 2022.
We have systematically searched ScienceDirect and IEEE Xplore databases with some search limitations. From the searched literature we selected titles, abstracts and keywords, and limitations sections. We conducted an extensive structured literature search and analysis by filtering the literature and sorting out the available information.
A total of 47 papers ended up meeting the requirements of this review. Researchers in all of these literatures encountered different challenges, most of which were focused on the data collection step, with few challenges encountered in the data preparation phase and almost none in the topic discovery section. The challenges were mainly divided into the points of how to collect data quickly, how to get the required data samples, how to filter the data, what to do if the data set is too small, how to pick the right classifier and how to deal with topic drift and diversity. In addition, researchers have proposed partial solutions to the challenges, and we have also proposed possible solutions.
This review found that Infodemic is a rapidly growing research area that attracts the interest of researchers from different disciplines. The number of studies in this field has increased significantly in recent years, with researchers from different countries, including the United States, India, and China. Infodemic topic discovery, data collection, and data preparation are not easy, and each step faces different challenges. While there is some research in this emerging field, there are still many challenges that need to be addressed. These findings highlight the need for more articles to address these issues and fill these gaps.
2019年冠状病毒病(COVID-19)大流行给社会带来了巨大冲击,随之而来的信息问题同时也对社会产生了巨大影响。人们迫切需要了解信息疫情,即与疫情相关的虚假信息传播的重要性,这一点已得到凸显。然而,尽管对这一现象的兴趣日益浓厚,但在信息分析过程的主题发现、数据收集和数据准备阶段的研究却一直欠缺。
由于疫情是史无前例的,且至今尚未结束,我们旨在审视2019年1月至2022年12月期间现有的与信息疫情相关的文献。
我们在有一些搜索限制的情况下,系统地搜索了ScienceDirect和IEEE Xplore数据库。从搜索到的文献中,我们选取了标题、摘要、关键词以及限制部分。我们通过筛选文献和整理可用信息,进行了广泛的结构化文献搜索和分析。
共有47篇论文最终符合本综述的要求。所有这些文献中的研究人员都遇到了不同的挑战,其中大部分集中在数据收集步骤,在数据准备阶段遇到的挑战较少,而在主题发现部分几乎没有遇到挑战。这些挑战主要分为如何快速收集数据、如何获取所需的数据样本、如何过滤数据、如果数据集太小该怎么办、如何选择合适的分类器以及如何应对主题漂移和多样性等要点。此外,研究人员已经针对这些挑战提出了部分解决方案,我们也提出了可能的解决方案。
本综述发现,信息疫情是一个迅速发展的研究领域,吸引了来自不同学科的研究人员的兴趣。近年来,该领域的研究数量显著增加,研究人员来自包括美国、印度和中国在内的不同国家。信息疫情的主题发现、数据收集和数据准备并非易事,每一步都面临不同的挑战。虽然在这个新兴领域已经有了一些研究,但仍有许多挑战需要解决。这些发现凸显了需要更多文章来解决这些问题并填补这些空白。