新冠病毒-19疫苗犹豫：基于新冠病毒-19疫苗接种推特数据集的文本挖掘、情感分析与机器学习

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.

作者信息

Qorib Miftahul, Oladunni Timothy, Denis Max, Ososanya Esther, Cotae Paul

机构信息

Department of Computer Science and Information Technology, University of the District of Columbia, Washington, DC, United States.

Department of Computer Science, Morgan State University, Baltimore, MD, United States.

出版信息

Expert Syst Appl. 2023 Feb;212:118715. doi: 10.1016/j.eswa.2022.118715. Epub 2022 Sep 5.

DOI:10.1016/j.eswa.2022.118715

PMID:36092862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9443617/

Abstract

In 2019 there was an outbreak of coronavirus pandemic also known as COVID-19. Many scientists believe that the pandemic originated from Wuhan, China, before spreading to other parts of the globe. To reduce the spread of the disease, decision makers encouraged measures such as hand washing, face masking, and social distancing. In early 2021, some countries including the United States began administering COVID-19 vaccines. Vaccination brought a relief to the public; it also generated a lot of debates from anti-vaccine and pro-vaccine groups. The controversy and debate surrounding COVID-19 vaccine influenced the decision of several people in either to accept or reject vaccination. Because of data limitations, social media data, collected through live streaming public tweets using an Application Programming Interface (API) search, is considered a viable and reliable resource to study the opinion of the public on Covid-19 vaccine hesitancy. Thus, this study examines 3 sentiment computation methods (Azure Machine Learning, VADER, and TextBlob) to analyze COVID-19 vaccine hesitancy. Five learning algorithms (Random Forest, Logistics Regression, Decision Tree, LinearSVC, and Naïve Bayes) with different combination of three vectorization methods (Doc2Vec, CountVectorizer, and TF-IDF) were deployed. Vocabulary normalization was threefold; potter stemming, lemmatization, and potter stemming with lemmatization. For each vocabulary normalization strategy, we designed, developed, and evaluated 42 models. The study shows that Covid-19 vaccine hesitancy slowly decreases over time; suggesting that the public gradually feels warm and optimistic about COVID-19 vaccination. Moreover, combining potter stemming and lemmatization increased model performances. Finally, the result of our experiment shows that TextBlob + TF-IDF + LinearSVC has the best performance in classifying public sentiment into positive, neutral, or negative with an accuracy, precision, recall and F1 score of 0.96752, 0.96921, 0.92807 and 0.94702 respectively. It means that the best performance was achieved when using TextBlob sentiment score, with TF-IDF vectorization and LinearSVC classification model. We also found out that combining two vectorizations (CountVectorizer and TF-IDF) decreases model accuracy.

摘要

2019年，爆发了冠状病毒大流行，也被称为COVID-19。许多科学家认为，这场大流行起源于中国武汉，然后蔓延到全球其他地区。为了减少疾病传播，决策者鼓励采取洗手、戴口罩和保持社交距离等措施。2021年初，包括美国在内的一些国家开始接种COVID-19疫苗。疫苗接种给公众带来了缓解；但也引发了反疫苗和支持疫苗群体之间的诸多争论。围绕COVID-19疫苗的争议和辩论影响了一些人接受或拒绝接种疫苗的决定。由于数据限制，通过使用应用程序编程接口（API）搜索实时流式传输公共推文收集的社交媒体数据，被认为是研究公众对COVID-19疫苗犹豫态度观点的可行且可靠资源。因此，本研究考察了三种情感计算方法（Azure机器学习、VADER和TextBlob）来分析COVID-19疫苗犹豫态度。部署了五种学习算法（随机森林、逻辑回归、决策树、线性支持向量分类器和朴素贝叶斯），并采用三种矢量化方法（Doc2Vec、计数矢量化器和词频逆文档频率）的不同组合。词汇规范化有三种方式：波特词干提取、词形还原以及波特词干提取与词形还原相结合。对于每种词汇规范化策略，我们设计、开发并评估了42个模型。研究表明，COVID-19疫苗犹豫态度随时间推移逐渐降低；这表明公众对COVID-19疫苗接种逐渐感到积极和乐观。此外，将波特词干提取和词形还原相结合提高了模型性能。最后，我们的实验结果表明，TextBlob + 词频逆文档频率 + 线性支持向量分类器在将公众情感分类为积极、中性或消极方面表现最佳，其准确率、精确率、召回率和F1分数分别为0.96752、0.96921、0.92807和0.94702。这意味着使用TextBlob情感分数、词频逆文档频率矢量化和线性支持向量分类模型时取得了最佳性能。我们还发现，将两种矢量化方法（计数矢量化器和词频逆文档频率）结合会降低模型准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6e9/9443617/a9758859406e/gr1_lrg.jpg

相似文献

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.

Expert Syst Appl. 2023 Feb;212:118715. doi: 10.1016/j.eswa.2022.118715. Epub 2022 Sep 5.

COVID-19 Vaccine Hesitancy: A Global Public Health and Risk Modelling Framework Using an Environmental Deep Neural Network, Sentiment Classification with Text Mining and Emotional Reactions from COVID-19 Vaccination Tweets.

Int J Environ Res Public Health. 2023 May 12;20(10):5803. doi: 10.3390/ijerph20105803.

Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.

J Supercomput. 2023 May 7:1-31. doi: 10.1007/s11227-023-05319-8.

Public Perception of SARS-CoV-2 Vaccinations on Social Media: Questionnaire and Sentiment Analysis.

Int J Environ Res Public Health. 2021 Dec 10;18(24):13028. doi: 10.3390/ijerph182413028.

Exploring Coronavirus Disease 2019 Vaccine Hesitancy on Twitter Using Sentiment Analysis and Natural Language Processing Algorithms.

Clin Infect Dis. 2022 May 15;74(Suppl_3):e4-e9. doi: 10.1093/cid/ciac141.

Tracking Public Attitudes Toward COVID-19 Vaccination on Tweets in Canada: Using Aspect-Based Sentiment Analysis.

J Med Internet Res. 2022 Mar 29;24(3):e35016. doi: 10.2196/35016.

Public sentiments toward COVID-19 vaccines in South African cities: An analysis of Twitter posts.

Front Public Health. 2022 Aug 12;10:987376. doi: 10.3389/fpubh.2022.987376. eCollection 2022.

Actions Speak Louder Than Words: Sentiment and Topic Analysis of COVID-19 Vaccination on Twitter and Vaccine Uptake.

JMIR Form Res. 2022 Sep 15;6(9):e37775. doi: 10.2196/37775.

Sentiment Analysis of Lockdown in India During COVID-19: A Case Study on Twitter.

IEEE Trans Comput Soc Syst. 2020 Dec 21;8(4):992-1002. doi: 10.1109/TCSS.2020.3042446. eCollection 2021 Aug.

Examining Public Sentiments and Attitudes Toward COVID-19 Vaccination: Infoveillance Study Using Twitter Posts.

JMIR Infodemiology. 2022 Apr 15;2(1):e33909. doi: 10.2196/33909. eCollection 2022 Jan-Jun.

引用本文的文献

Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.

MethodsX. 2025 May 30;14:103407. doi: 10.1016/j.mex.2025.103407. eCollection 2025 Jun.

Patient Voices in Dialysis Care: Sentiment Analysis and Topic Modeling Study of Social Media Discourse.

J Med Internet Res. 2025 May 15;27:e70128. doi: 10.2196/70128.

TriLex: A fusion approach for unsupervised sentiment analysis of short texts.

PLoS One. 2025 Apr 17;20(4):e0317100. doi: 10.1371/journal.pone.0317100. eCollection 2025.

Intratumoral and peritumoral radiomics for forecasting microsatellite status in gastric cancer: a multicenter study.

BMC Cancer. 2025 Jan 11;25(1):66. doi: 10.1186/s12885-025-13450-3.

Analyzing Patient Experience on Weibo: Machine Learning Approach to Topic Modeling and Sentiment Analysis.

JMIR Med Inform. 2024 Nov 29;12:e59249. doi: 10.2196/59249.

Genetic Algorithms for Feature Selection in the Classification of COVID-19 Patients.

Bioengineering (Basel). 2024 Sep 23;11(9):952. doi: 10.3390/bioengineering11090952.

Understanding COVID-19 vaccine hesitancy of different regions in the post-epidemic era: A causality deep learning approach.

Digit Health. 2024 Sep 25;10:20552076241272712. doi: 10.1177/20552076241272712. eCollection 2024 Jan-Dec.

Natural Language Processing-Powered Real-Time Monitoring Solution for Vaccine Sentiments and Hesitancy on Social Media: System Development and Validation.

JMIR Med Inform. 2024 Jun 21;12:e57164. doi: 10.2196/57164.

Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media.

PLoS One. 2024 Jun 6;19(6):e0301488. doi: 10.1371/journal.pone.0301488. eCollection 2024.

Comparison of Impressions of COVID-19 Vaccinations Stratified by the Number of Vaccinations Among Japanese Healthcare Professional University Students.

Cureus. 2024 Mar 9;16(3):e55861. doi: 10.7759/cureus.55861. eCollection 2024 Mar.

本文引用的文献

Mapping of the Covid-19 Vaccine Uptake Determinants From Mining Twitter Data.

IEEE Access. 2021 Sep 24;9:134929-134944. doi: 10.1109/ACCESS.2021.3115554. eCollection 2021.

COVID-19 vaccine mandate for healthcare workers in the United States: a social justice policy.

Expert Rev Vaccines. 2022 Jan;21(1):37-45. doi: 10.1080/14760584.2022.1999811. Epub 2021 Nov 16.

COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Data Set of Antivaccine Content, Vaccine Misinformation, and Conspiracies.

JMIR Public Health Surveill. 2021 Nov 17;7(11):e30642. doi: 10.2196/30642.

Considerations in boosting COVID-19 vaccine immune responses.

Lancet. 2021 Oct 9;398(10308):1377-1380. doi: 10.1016/S0140-6736(21)02046-8. Epub 2021 Sep 14.

Public attitudes toward COVID-19 vaccines on English-language Twitter: A sentiment analysis.

Vaccine. 2021 Sep 15;39(39):5499-5505. doi: 10.1016/j.vaccine.2021.08.058. Epub 2021 Aug 17.

The use of social media and online communications in times of pandemic COVID-19.

J Intensive Care Soc. 2021 Aug;22(3):255-260. doi: 10.1177/1751143720966280. Epub 2020 Oct 22.

Reduced Risk of Reinfection with SARS-CoV-2 After COVID-19 Vaccination - Kentucky, May-June 2021.

MMWR Morb Mortal Wkly Rep. 2021 Aug 13;70(32):1081-1083. doi: 10.15585/mmwr.mm7032e1.

Using Twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines.

Postgrad Med J. 2022 Jul;98(1161):544-550. doi: 10.1136/postgradmedj-2021-140685. Epub 2021 Aug 9.

Association of social distancing and face mask use with risk of COVID-19.

Nat Commun. 2021 Jun 18;12(1):3737. doi: 10.1038/s41467-021-24115-7.

COVID-19 Vaccine-Related Discussion on Twitter: Topic Modeling and Sentiment Analysis.

J Med Internet Res. 2021 Jun 29;23(6):e24435. doi: 10.2196/24435.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

新冠病毒-19疫苗犹豫：基于新冠病毒-19疫苗接种推特数据集的文本挖掘、情感分析与机器学习

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.

作者信息

Qorib Miftahul, Oladunni Timothy, Denis Max, Ososanya Esther, Cotae Paul

机构信息

Department of Computer Science and Information Technology, University of the District of Columbia, Washington, DC, United States.

Department of Computer Science, Morgan State University, Baltimore, MD, United States.

出版信息

Expert Syst Appl. 2023 Feb;212:118715. doi: 10.1016/j.eswa.2022.118715. Epub 2022 Sep 5.

DOI:10.1016/j.eswa.2022.118715

PMID:36092862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9443617/

Abstract

摘要

新冠病毒-19疫苗犹豫：基于新冠病毒-19疫苗接种推特数据集的文本挖掘、情感分析与机器学习

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

新冠病毒-19疫苗犹豫：基于新冠病毒-19疫苗接种推特数据集的文本挖掘、情感分析与机器学习

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献