TClustVID：一种用于研究新冠疫情推文主题和情感的新型机器学习分类模型。

TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets.

作者信息

Satu Md Shahriare, Khan Md Imran, Mahmud Mufti, Uddin Shahadat, Summers Matthew A, Quinn Julian M W, Moni Mohammad Ali

机构信息

Department of Management Information Systems, Noakhali Science & Technology University, Noakhali, 3814, Bangladesh.

Department of Computer Scienc & Engineering, Gono Bishwabidyalay, Savar, Dhaka, 1344, Bangladesh.

出版信息

Knowl Based Syst. 2021 Aug 17;226:107126. doi: 10.1016/j.knosys.2021.107126. Epub 2021 May 6.

DOI:10.1016/j.knosys.2021.107126

PMID:33972817

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8099549/

Abstract

COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over key aspects of the virus infectiousness (particularly the newly emerging variants) and the disease has had severe economic impacts globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially influence public opinions and in some cases can exacerbate the widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topic extracting model named TClustVID that analyzes COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed on these datasets which enabled the exploration of the performance of traditional classification and TClustVID. Our analysis found that TClustVID showed higher performance compared to traditional methodologies that are determined by clustering criteria. Finally, we extracted significant topics from the clusters, split them into positive, neutral and negative sentiments, and identified the most frequent topics using the proposed model. This approach is able to rapidly identify commonly prevailing aspects of public opinions and attitudes related to COVID-19 and infection prevention strategies spreading among different populations.

摘要

由严重急性呼吸综合征冠状病毒2（SARS-CoV2）感染引起的2019冠状病毒病（COVID-19），严重程度差异很大，但会出现严重的呼吸道症状，并伴有血管及其他并发症，在老年人中尤为如此。该疾病可通过有症状和无症状的感染者传播。病毒传染性的关键方面（特别是新出现的变种）仍存在不确定性，并且该疾病在全球范围内造成了严重的经济影响。出于这些原因，COVID-19在包括脸书和推特在内的社交媒体平台上成为了激烈且广泛讨论的话题。这些公共论坛极大地影响公众舆论，在某些情况下会加剧危机期间广泛传播的恐慌和错误信息。因此，这项工作旨在设计一种名为TClustVID的基于智能聚类的分类和主题提取模型，该模型分析与COVID-19相关的公众推文，以高精度提取重要情感。我们从IEEE数据端口存储库收集了COVID-19推特数据集，并采用一系列数据预处理方法清理原始数据，然后进行词法分析并生成词到索引的字典。此后，对这些数据集进行了不同分类，并探索了传统分类和TClustVID的性能。我们的分析发现，与由聚类标准确定的传统方法相比，TClustVID表现出更高的性能。最后，我们从聚类中提取重要主题，将它们分为积极、中性和消极情感，并使用所提出的模型确定最常见的主题。这种方法能够快速识别不同人群中与COVID-19及感染预防策略相关的公众意见和态度中普遍存在的方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e115/8099549/cd625355ffbd/gr1_lrg.jpg

相似文献

TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets.TClustVID：一种用于研究新冠疫情推文主题和情感的新型机器学习分类模型。

Knowl Based Syst. 2021 Aug 17;226:107126. doi: 10.1016/j.knosys.2021.107126. Epub 2021 May 6.

Topics, Trends, and Sentiments of Tweets About the COVID-19 Pandemic: Temporal Infoveillance Study.关于新冠疫情的推文主题、趋势和情绪：时间信息监测研究

J Med Internet Res. 2020 Oct 23;22(10):e22624. doi: 10.2196/22624.

Examining Public Sentiments and Attitudes Toward COVID-19 Vaccination: Infoveillance Study Using Twitter Posts.审视公众对新冠疫苗接种的情绪和态度：利用推特帖子的信息监测研究

JMIR Infodemiology. 2022 Apr 15;2(1):e33909. doi: 10.2196/33909. eCollection 2022 Jan-Jun.

Tracking Public Attitudes Toward COVID-19 Vaccination on Tweets in Canada: Using Aspect-Based Sentiment Analysis.追踪加拿大推特上公众对 COVID-19 疫苗接种的态度：使用基于方面的情感分析。

J Med Internet Res. 2022 Mar 29;24(3):e35016. doi: 10.2196/35016.

Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach.关于新冠疫情的推特讨论与情绪：机器学习方法

J Med Internet Res. 2020 Nov 25;22(11):e20550. doi: 10.2196/20550.

Emotions and Topics Expressed on Twitter During the COVID-19 Pandemic in the United Kingdom: Comparative Geolocation and Text Mining Analysis.在英国 COVID-19 大流行期间在 Twitter 上表达的情绪和主题：比较地理定位和文本挖掘分析。

J Med Internet Res. 2022 Oct 5;24(10):e40323. doi: 10.2196/40323.

Tracking discussions of complementary, alternative, and integrative medicine in the context of the COVID-19 pandemic: a month-by-month sentiment analysis of Twitter data.在 COVID-19 大流行背景下追踪补充、替代和整合医学的讨论：对 Twitter 数据进行逐月情感分析。

BMC Complement Med Ther. 2022 Apr 13;22(1):105. doi: 10.1186/s12906-022-03586-1.

Tracking COVID-19 Discourse on Twitter in North America: Infodemiology Study Using Topic Modeling and Aspect-Based Sentiment Analysis.追踪北美地区推特上的 COVID-19 相关言论：使用主题建模和基于方面的情感分析的信息流行病学研究。

J Med Internet Res. 2021 Feb 10;23(2):e25431. doi: 10.2196/25431.

Identifying Key Topics Bearing Negative Sentiment on Twitter: Insights Concerning the 2015-2016 Zika Epidemic.识别推特上带有负面情绪的关键话题：关于2015 - 2016年寨卡疫情的见解

JMIR Public Health Surveill. 2019 Jun 4;5(2):e11036. doi: 10.2196/11036.

Social Network Analysis of COVID-19 Sentiments: Application of Artificial Intelligence.COVID-19 舆情的社会网络分析：人工智能的应用

J Med Internet Res. 2020 Aug 18;22(8):e22590. doi: 10.2196/22590.

引用本文的文献

Characterizing Public Sentiments and Drug Interactions in the COVID-19 Pandemic Using Social Media: Natural Language Processing and Network Analysis.利用社交媒体表征新冠疫情中的公众情绪与药物相互作用：自然语言处理与网络分析

J Med Internet Res. 2025 Mar 5;27:e63755. doi: 10.2196/63755.

When Infodemic Meets Epidemic: Systematic Literature Review.当信息疫情遇上疫情：系统文献综述

JMIR Public Health Surveill. 2025 Feb 3;11:e55642. doi: 10.2196/55642.

Identifying X (Formerly Twitter) Posts Relevant to Dementia and COVID-19: Machine Learning Approach.识别与痴呆症和新冠肺炎相关的X（原推特）帖子：机器学习方法

JMIR Form Res. 2024 Jun 4;8:e49562. doi: 10.2196/49562.

Mapping automatic social media information disorder. The role of bots and AI in spreading misleading information in society.自动社交媒体信息混乱的映射。机器人和人工智能在社会传播误导性信息中的作用。

PLoS One. 2024 May 31;19(5):e0303183. doi: 10.1371/journal.pone.0303183. eCollection 2024.

Exploring post-COVID-19 health effects and features with advanced machine learning techniques.利用先进的机器学习技术探索新冠疫情后的健康影响和特征。

Sci Rep. 2024 Apr 30;14(1):9884. doi: 10.1038/s41598-024-60504-w.

Enhancing public health response: a framework for topics and sentiment analysis of COVID-19 in the UK using Twitter and the embedded topic model.增强公共卫生应对能力：利用 Twitter 和嵌入式主题模型分析英国 COVID-19 主题和情绪的框架。

Front Public Health. 2024 Feb 21;12:1105383. doi: 10.3389/fpubh.2024.1105383. eCollection 2024.

COVID-19 Outbreak Forecasting Based on Vaccine Rates and Tweets Classification.基于疫苗接种率和推文分类的 COVID-19 疫情预测。

Comput Intell Neurosci. 2022 Oct 27;2022:4535541. doi: 10.1155/2022/4535541. eCollection 2022.

Opinion analysis and aspect understanding during covid-19 pandemic using BERT-Bi-LSTM ensemble method.利用 BERT-Bi-LSTM 集成方法进行新冠疫情期间的观点分析和方面理解。

Sci Rep. 2022 Oct 12;12(1):17095. doi: 10.1038/s41598-022-21604-7.

Leveraging Tweets for Artificial Intelligence Driven Sentiment Analysis on the COVID-19 Pandemic.利用推文进行人工智能驱动的新冠疫情情感分析。

Healthcare (Basel). 2022 May 13;10(5):910. doi: 10.3390/healthcare10050910.

COVID-19 analytics: Towards the effect of vaccine brands through analyzing public sentiment of tweets.新冠疫情分析：通过分析推文的公众情绪探究疫苗品牌的影响

Inform Med Unlocked. 2022;31:100969. doi: 10.1016/j.imu.2022.100969. Epub 2022 May 20.

本文引用的文献

Cross-Cultural Polarity and Emotion Detection Using Sentiment Analysis and Deep Learning on COVID-19 Related Tweets.基于情感分析和深度学习对新冠疫情相关推文进行跨文化极性与情感检测

IEEE Access. 2020 Sep 28;8:181074-181090. doi: 10.1109/ACCESS.2020.3027350. eCollection 2020.

NeedFull - a Tweet Analysis Platform to Study Human Needs During the COVID-19 Pandemic in New York State.NeedFull——一个用于研究纽约州新冠疫情期间人类需求的推文分析平台。

IEEE Access. 2020 Jul 22;8:136046-136055. doi: 10.1109/ACCESS.2020.3011123. eCollection 2020.

Lies Kill, Facts Save: Detecting COVID-19 Misinformation in Twitter.谎言杀人，事实救人：在推特上检测新冠疫情虚假信息

IEEE Access. 2020 Aug 26;8:155961-155970. doi: 10.1109/ACCESS.2020.3019600. eCollection 2020.

COVID-19: Detecting Government Pandemic Measures and Public Concerns from Twitter Arabic Data Using Distributed Machine Learning.COVID-19：利用分布式机器学习从推特阿拉伯语数据中检测政府大流行病措施和公众关切。

Int J Environ Res Public Health. 2021 Jan 1;18(1):282. doi: 10.3390/ijerph18010282.

A Sentiment Analysis Approach to Predict an Individual's Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia.一种用于预测个体对沙特阿拉伯预防 COVID-19 爆发的预防措施意识的情感分析方法。

Int J Environ Res Public Health. 2020 Dec 30;18(1):218. doi: 10.3390/ijerph18010218.

Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach.关于新冠疫情的推特讨论与情绪：机器学习方法

J Med Internet Res. 2020 Nov 25;22(11):e20550. doi: 10.2196/20550.

An "Infodemic": Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease 2019 Outbreak.一场“信息疫情”：利用大量推特数据来了解公众对2019年冠状病毒病疫情的早期情绪

Open Forum Infect Dis. 2020 Jun 30;7(7):ofaa258. doi: 10.1093/ofid/ofaa258. eCollection 2020 Jul.

Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study.公众对 Twitter 上 COVID-19 大流行的看法：情感分析和主题建模研究。

JMIR Public Health Surveill. 2020 Nov 11;6(4):e21978. doi: 10.2196/21978.

Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter.新冠疫情期间的公共话语和情绪：在 Twitter 上使用潜在狄利克雷分配进行主题建模。

PLoS One. 2020 Sep 25;15(9):e0239441. doi: 10.1371/journal.pone.0239441. eCollection 2020.

Modern Senicide in the Face of a Pandemic: An Examination of Public Discourse and Sentiment About Older Adults and COVID-19 Using Machine Learning.大流行下的现代自杀：使用机器学习考察关于老年人和 COVID-19 的公共话语和情绪。

J Gerontol B Psychol Sci Soc Sci. 2021 Mar 14;76(4):e190-e200. doi: 10.1093/geronb/gbaa128.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

TClustVID：一种用于研究新冠疫情推文主题和情感的新型机器学习分类模型。

TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献