Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan.
Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan.
PLoS One. 2021 Feb 25;16(2):e0245909. doi: 10.1371/journal.pone.0245909. eCollection 2021.
The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document frequency. Tweets are classified as positive, neutral, or negative. Performance of classifiers is evaluated on the accuracy, precision, recall, and F1 score. For completeness, further investigation is made on the dataset using the Long Short-Term Memory (LSTM) architecture of the deep learning model. The results show that Extra Trees Classifiers outperform all other models by achieving a 0.93 accuracy score using our proposed concatenated features set. The LSTM achieves low accuracy as compared to machine learning classifiers. To demonstrate the effectiveness of our proposed feature set, the results are compared with the Vader sentiment analysis technique based on the GloVe feature extraction approach.
Covid-19 的传播引起了全球的健康关注。社交媒体越来越多地被用于分享有关它的新闻和观点。为了优化和合理利用资源,有必要对疫情进行现实评估。在这项研究中,我们使用有监督的机器学习方法对新冠疫情推文进行情感分析。从推文中识别新冠疫情的情绪,可以为更好地应对当前的大流行情况做出明智的决策。所使用的数据集是使用 IEEE 数据端口提供的 ID 从 Twitter 上提取的。使用 Tweepy 库的内部构建爬虫提取推文。使用预处理技术对数据集进行清理,并使用 TextBlob 库提取情绪。这项工作的贡献在于使用我们提出的特征集对各种机器学习分类器进行性能评估。该集合由词袋和词频逆文档频率连接而成。推文被分类为积极、中立或消极。分类器的性能是根据准确性、精度、召回率和 F1 得分进行评估的。为了完整性,还使用深度学习模型的长短时记忆 (LSTM) 架构对数据集进行了进一步调查。结果表明,使用我们提出的连接特征集,随机森林分类器的性能优于所有其他模型,准确率达到 0.93。LSTM 的准确率与机器学习分类器相比较低。为了展示我们提出的特征集的有效性,将结果与基于 GloVe 特征提取方法的 Vader 情感分析技术进行了比较。