Qorib Miftahul, Oladunni Timothy, Denis Max, Ososanya Esther, Cotae Paul
Department of Computer Science and Information Technology, University of the District of Columbia, Washington, DC, United States.
Department of Computer Science, Morgan State University, Baltimore, MD, United States.
Expert Syst Appl. 2023 Feb;212:118715. doi: 10.1016/j.eswa.2022.118715. Epub 2022 Sep 5.
In 2019 there was an outbreak of coronavirus pandemic also known as COVID-19. Many scientists believe that the pandemic originated from Wuhan, China, before spreading to other parts of the globe. To reduce the spread of the disease, decision makers encouraged measures such as hand washing, face masking, and social distancing. In early 2021, some countries including the United States began administering COVID-19 vaccines. Vaccination brought a relief to the public; it also generated a lot of debates from anti-vaccine and pro-vaccine groups. The controversy and debate surrounding COVID-19 vaccine influenced the decision of several people in either to accept or reject vaccination. Because of data limitations, social media data, collected through live streaming public tweets using an Application Programming Interface (API) search, is considered a viable and reliable resource to study the opinion of the public on Covid-19 vaccine hesitancy. Thus, this study examines 3 sentiment computation methods (Azure Machine Learning, VADER, and TextBlob) to analyze COVID-19 vaccine hesitancy. Five learning algorithms (Random Forest, Logistics Regression, Decision Tree, LinearSVC, and Naïve Bayes) with different combination of three vectorization methods (Doc2Vec, CountVectorizer, and TF-IDF) were deployed. Vocabulary normalization was threefold; potter stemming, lemmatization, and potter stemming with lemmatization. For each vocabulary normalization strategy, we designed, developed, and evaluated 42 models. The study shows that Covid-19 vaccine hesitancy slowly decreases over time; suggesting that the public gradually feels warm and optimistic about COVID-19 vaccination. Moreover, combining potter stemming and lemmatization increased model performances. Finally, the result of our experiment shows that TextBlob + TF-IDF + LinearSVC has the best performance in classifying public sentiment into positive, neutral, or negative with an accuracy, precision, recall and F1 score of 0.96752, 0.96921, 0.92807 and 0.94702 respectively. It means that the best performance was achieved when using TextBlob sentiment score, with TF-IDF vectorization and LinearSVC classification model. We also found out that combining two vectorizations (CountVectorizer and TF-IDF) decreases model accuracy.
2019年,爆发了冠状病毒大流行,也被称为COVID-19。许多科学家认为,这场大流行起源于中国武汉,然后蔓延到全球其他地区。为了减少疾病传播,决策者鼓励采取洗手、戴口罩和保持社交距离等措施。2021年初,包括美国在内的一些国家开始接种COVID-19疫苗。疫苗接种给公众带来了缓解;但也引发了反疫苗和支持疫苗群体之间的诸多争论。围绕COVID-19疫苗的争议和辩论影响了一些人接受或拒绝接种疫苗的决定。由于数据限制,通过使用应用程序编程接口(API)搜索实时流式传输公共推文收集的社交媒体数据,被认为是研究公众对COVID-19疫苗犹豫态度观点的可行且可靠资源。因此,本研究考察了三种情感计算方法(Azure机器学习、VADER和TextBlob)来分析COVID-19疫苗犹豫态度。部署了五种学习算法(随机森林、逻辑回归、决策树、线性支持向量分类器和朴素贝叶斯),并采用三种矢量化方法(Doc2Vec、计数矢量化器和词频逆文档频率)的不同组合。词汇规范化有三种方式:波特词干提取、词形还原以及波特词干提取与词形还原相结合。对于每种词汇规范化策略,我们设计、开发并评估了42个模型。研究表明,COVID-19疫苗犹豫态度随时间推移逐渐降低;这表明公众对COVID-19疫苗接种逐渐感到积极和乐观。此外,将波特词干提取和词形还原相结合提高了模型性能。最后,我们的实验结果表明,TextBlob + 词频逆文档频率 + 线性支持向量分类器在将公众情感分类为积极、中性或消极方面表现最佳,其准确率、精确率、召回率和F1分数分别为0.96752、0.96921、0.92807和0.94702。这意味着使用TextBlob情感分数、词频逆文档频率矢量化和线性支持向量分类模型时取得了最佳性能。我们还发现,将两种矢量化方法(计数矢量化器和词频逆文档频率)结合会降低模型准确率。