Singhal Aditya, Baxi Manmeet Kaur, Mago Vijay
Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada.
JMIR Med Inform. 2022 Aug 18;10(8):e37829. doi: 10.2196/37829.
Social media platforms (SMPs) are frequently used by various pharmaceutical companies, public health agencies, and nongovernment organizations (NGOs) for communicating health concerns, new advancements, and potential outbreaks. Although the benefits of using them as a tool have been extensively discussed, the online activity of various health care organizations on SMPs during COVID-19 in terms of engagement and sentiment forecasting has not been thoroughly investigated.
The purpose of this research is to analyze the nature of information shared on Twitter, understand the public engagement generated on it, and forecast the sentiment score for various organizations.
Data were collected from the Twitter handles of 5 pharmaceutical companies, 10 US and Canadian public health agencies, and the World Health Organization (WHO) from January 1, 2017, to December 31, 2021. A total of 181,469 tweets were divided into 2 phases for the analysis, before COVID-19 and during COVID-19, based on the confirmation of the first COVID-19 community transmission case in North America on February 26, 2020. We conducted content analysis to generate health-related topics using natural language processing (NLP)-based topic-modeling techniques, analyzed public engagement on Twitter, and performed sentiment forecasting using 16 univariate moving-average and machine learning (ML) models to understand the correlation between public opinion and tweet contents.
We utilized the topics modeled from the tweets authored by the health care organizations chosen for our analysis using nonnegative matrix factorization (NMF): c=-3.6530 and -3.7944 before and during COVID-19, respectively. The topics were chronic diseases, health research, community health care, medical trials, COVID-19, vaccination, nutrition and well-being, and mental health. In terms of user impact, WHO (user impact=4171.24) had the highest impact overall, followed by public health agencies, the Centers for Disease Control and Prevention (CDC; user impact=2895.87), and the National Institutes of Health (NIH; user impact=891.06). Among pharmaceutical companies, Pfizer's user impact was the highest at 97.79. Furthermore, for sentiment forecasting, autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average with exogenous factors (SARIMAX) models performed best on the majority of the subsets of data (divided as per the health care organization and period), with the mean absolute error (MAE) between 0.027 and 0.084, the mean square error (MSE) between 0.001 and 0.011, and the root-mean-square error (RMSE) between 0.031 and 0.105.
Our findings indicate that people engage more on topics such as COVID-19 than medical trials and customer experience. In addition, there are notable differences in the user engagement levels across organizations. Global organizations, such as WHO, show wide variations in engagement levels over time. The sentiment forecasting method discussed presents a way for organizations to structure their future content to ensure maximum user engagement.
各种制药公司、公共卫生机构和非政府组织(NGO)经常使用社交媒体平台(SMP)来传达健康问题、新进展和潜在疫情。尽管将其作为一种工具的好处已得到广泛讨论,但在新冠疫情期间,各类医疗保健组织在SMP上的在线活动在参与度和情绪预测方面尚未得到充分研究。
本研究的目的是分析在推特上分享的信息的性质,了解由此产生的公众参与度,并预测各组织的情绪得分。
从2017年1月1日至2021年12月31日,收集了5家制药公司、10家美国和加拿大公共卫生机构以及世界卫生组织(WHO)的推特账号数据。根据2020年2月26日北美首例新冠社区传播病例的确认情况,总共181469条推文被分为两个阶段进行分析,即新冠疫情之前和期间。我们使用基于自然语言处理(NLP)的主题建模技术进行内容分析,以生成与健康相关的主题,分析推特上的公众参与度,并使用16种单变量移动平均和机器学习(ML)模型进行情绪预测,以了解公众舆论与推文内容之间的相关性。
我们利用非负矩阵分解(NMF)对所选医疗保健组织撰写的推文进行建模的主题:新冠疫情之前为c = -3.6530,新冠疫情期间为c = -3.7944。这些主题包括慢性病、健康研究、社区医疗保健、医学试验、新冠疫情、疫苗接种、营养与健康以及心理健康。在用户影响力方面,WHO(用户影响力 = 4171.24)总体影响力最高,其次是公共卫生机构、疾病控制与预防中心(CDC;用户影响力 = 2895.87)和美国国立卫生研究院(NIH;用户影响力 = 891.06)。在制药公司中,辉瑞的用户影响力最高,为97.79。此外,对于情绪预测,自回归积分移动平均(ARIMA)和带有外生因素的季节性自回归积分移动平均(SARIMAX)模型在大多数数据子集(按医疗保健组织和时期划分)上表现最佳,平均绝对误差(MAE)在0.027至0.084之间,均方误差(MSE)在0.001至0.011之间,均方根误差(RMSE)在0.031至0.105之间。
我们的研究结果表明,人们对新冠疫情等主题的参与度高于医学试验和客户体验。此外,各组织的用户参与度水平存在显著差异。像WHO这样的全球组织,其参与度水平随时间有很大变化。所讨论的情绪预测方法为各组织构建未来内容以确保最大用户参与度提供了一种途径。