Satu Md Shahriare, Khan Md Imran, Mahmud Mufti, Uddin Shahadat, Summers Matthew A, Quinn Julian M W, Moni Mohammad Ali
Department of Management Information Systems, Noakhali Science & Technology University, Noakhali, 3814, Bangladesh.
Department of Computer Scienc & Engineering, Gono Bishwabidyalay, Savar, Dhaka, 1344, Bangladesh.
Knowl Based Syst. 2021 Aug 17;226:107126. doi: 10.1016/j.knosys.2021.107126. Epub 2021 May 6.
COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over key aspects of the virus infectiousness (particularly the newly emerging variants) and the disease has had severe economic impacts globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially influence public opinions and in some cases can exacerbate the widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topic extracting model named TClustVID that analyzes COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed on these datasets which enabled the exploration of the performance of traditional classification and TClustVID. Our analysis found that TClustVID showed higher performance compared to traditional methodologies that are determined by clustering criteria. Finally, we extracted significant topics from the clusters, split them into positive, neutral and negative sentiments, and identified the most frequent topics using the proposed model. This approach is able to rapidly identify commonly prevailing aspects of public opinions and attitudes related to COVID-19 and infection prevention strategies spreading among different populations.
由严重急性呼吸综合征冠状病毒2(SARS-CoV2)感染引起的2019冠状病毒病(COVID-19),严重程度差异很大,但会出现严重的呼吸道症状,并伴有血管及其他并发症,在老年人中尤为如此。该疾病可通过有症状和无症状的感染者传播。病毒传染性的关键方面(特别是新出现的变种)仍存在不确定性,并且该疾病在全球范围内造成了严重的经济影响。出于这些原因,COVID-19在包括脸书和推特在内的社交媒体平台上成为了激烈且广泛讨论的话题。这些公共论坛极大地影响公众舆论,在某些情况下会加剧危机期间广泛传播的恐慌和错误信息。因此,这项工作旨在设计一种名为TClustVID的基于智能聚类的分类和主题提取模型,该模型分析与COVID-19相关的公众推文,以高精度提取重要情感。我们从IEEE数据端口存储库收集了COVID-19推特数据集,并采用一系列数据预处理方法清理原始数据,然后进行词法分析并生成词到索引的字典。此后,对这些数据集进行了不同分类,并探索了传统分类和TClustVID的性能。我们的分析发现,与由聚类标准确定的传统方法相比,TClustVID表现出更高的性能。最后,我们从聚类中提取重要主题,将它们分为积极、中性和消极情感,并使用所提出的模型确定最常见的主题。这种方法能够快速识别不同人群中与COVID-19及感染预防策略相关的公众意见和态度中普遍存在的方面。