Amin Samina, Alharbi Abdullah, Uddin M Irfan, Alyami Hashem
Institute of Computing, Kohat University of Science and Technology, Kohat, 2600 Pakistan.
Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944 Saudi Arabia.
Soft comput. 2022;26(20):11077-11089. doi: 10.1007/s00500-022-07405-0. Epub 2022 Aug 10.
The COVID-19 infection, which began in December 2019, has claimed many lives and impacted all aspects of human life. With time, COVID-19 was identified as a pandemic outbreak by the World Health Organization (WHO), putting massive pressure on global health. During this ongoing pandemic, the exponential growth of social media platforms has provided valuable resources for distributing information, as well as a source for self-reported disease symptoms in public discourse. Therefore, there is an urgent need for effective approaches to detect self-reported symptoms or cases in social media content. In this study, we scrapped public discourse on COVID-19 symptoms in Twitter content. For this, we developed a huge dataset of COVID-19 self-reported symptoms and gold-annotated the tweets into four categories: confirmed, death, suspected, and recovered. Then, we use a machine and deep machine learning models, each with its own set of features, such as feature representation. Furthermore, the experimentations were achieved with recurrent neural networks (RNNs) variants and compared their performance with traditional machine learning algorithms. Experimental results report that optimizing the area under the curve (AUC) enhances model performance, and the long short-term memory (LSTM) has the highest accuracy in detecting COVID-19 symptoms in real-time public messaging. Thus, the LSTM classifier in the proposed pipeline achieves a classification accuracy of 90.7%, outperforming existing state-of-the-art algorithms for multi-class classification.
2019年12月开始的新型冠状病毒肺炎(COVID-19)感染已夺走许多生命,并影响到人类生活的方方面面。随着时间的推移,COVID-19被世界卫生组织(WHO)认定为大流行疫情,给全球卫生带来了巨大压力。在这场持续的大流行期间,社交媒体平台的指数级增长为信息传播提供了宝贵资源,同时也成为公共话语中自我报告疾病症状的一个来源。因此,迫切需要有效的方法来检测社交媒体内容中自我报告的症状或病例。在本研究中,我们收集了推特内容中关于COVID-19症状的公共话语。为此,我们开发了一个关于COVID-19自我报告症状的庞大数据集,并将推文黄金标注为四类:确诊、死亡、疑似和康复。然后,我们使用了机器学习和深度机器学习模型,每个模型都有自己的一组特征,如特征表示。此外,我们使用循环神经网络(RNN)变体进行了实验,并将其性能与传统机器学习算法进行了比较。实验结果表明,优化曲线下面积(AUC)可提高模型性能,长短期记忆(LSTM)在实时公共信息中检测COVID-19症状方面具有最高的准确率。因此,所提出的管道中的LSTM分类器实现了90.7%的分类准确率,优于现有的多类分类的最先进算法。