Wang Tianyi, Lu Ke, Chow Kam Pui, Zhu Qing
Department of Computer ScienceThe University of Hong Kong Hong Kong.
Department of Social Work and Social AdministrationThe University of Hong Kong Hong Kong.
IEEE Access. 2020 Jul 28;8:138162-138169. doi: 10.1109/ACCESS.2020.3012595. eCollection 2020.
Coronavirus disease 2019 (COVID-19) poses massive challenges for the world. Public sentiment analysis during the outbreak provides insightful information in making appropriate public health responses. On Sina Weibo, a popular Chinese social media, posts with negative sentiment are valuable in analyzing public concerns. 999,978 randomly selected COVID-19 related Weibo posts from 1 January 2020 to 18 February 2020 are analyzed. Specifically, the unsupervised BERT (Bidirectional Encoder Representations from Transformers) model is adopted to classify sentiment categories (positive, neutral, and negative) and TF-IDF (term frequency-inverse document frequency) model is used to summarize the topics of posts. Trend analysis and thematic analysis are conducted to identify characteristics of negative sentiment. In general, the fine-tuned BERT conducts sentiment classification with considerable accuracy. Besides, topics extracted by TF-IDF precisely convey characteristics of posts regarding COVID-19. As a result, we observed that people concern four aspects regarding COVID-19, the virus Origin (Gamey Food, 3.08%; Bat, 2.70%; Conspiracy Theory, 1.43%), Symptom (Fever, 2.13%; Cough, 1.19%), Production Activity (Go to Work, 1.94%; Resume Work, 1.12%; School New Semester Beginning, 1.06%) and Public Health Control (Temperature Taking, 1.39%; Coronavirus Cover-up, 1.26%; City Shutdown, 1.09%). Results from Weibo posts provide constructive instructions on public health responses, that transparent information sharing and scientific guidance might help alleviate public concerns.
2019冠状病毒病(COVID-19)给全球带来了巨大挑战。疫情期间的公众情绪分析为做出恰当的公共卫生应对措施提供了有深刻见解的信息。在中国流行的社交媒体新浪微博上,带有负面情绪的帖子对于分析公众关注的问题很有价值。本文分析了从2020年1月1日至2020年2月18日随机选取的999978条与COVID-19相关的微博帖子。具体而言,采用无监督的BERT(来自Transformer的双向编码器表征)模型对情绪类别(积极、中性和消极)进行分类,并使用TF-IDF(词频-逆文档频率)模型对帖子主题进行总结。通过趋势分析和主题分析来识别负面情绪的特征。总体而言,微调后的BERT进行情绪分类的准确率相当高。此外,TF-IDF提取的主题准确地传达了与COVID-19相关帖子的特征。结果,我们观察到人们在COVID-19方面关注四个方面,即病毒起源(野味,3.08%;蝙蝠,2.70%;阴谋论,1.43%)、症状(发热,2.13%;咳嗽,1.19%)、生产活动(上班,1.94%;复工,1.12%;学校新学期开学,1.06%)和公共卫生防控(体温检测,1.39%;冠状病毒隐瞒,1.26%;城市封锁,1.09%)。微博帖子的结果为公共卫生应对措施提供了建设性的指导意见,即透明的信息共享和科学的指导可能有助于缓解公众的担忧。