Luo Linkai, Wang Yue, Liu Hai
Department of Supply Chain and Information Management, The Hang Seng University of Hong Kong, Hong Kong Special Administrative Region.
Department of Computing, The Hang Seng University of Hong Kong, Hong Kong Special Administrative Region.
Expert Syst Appl. 2022 Aug 15;200:117139. doi: 10.1016/j.eswa.2022.117139. Epub 2022 Apr 2.
Twitter offers extensive and valuable information on the spread of COVID-19 and the current state of public health. Mining tweets could be an important supplement for public health departments in monitoring the status of COVID-19 in a timely manner and taking the appropriate actions to minimize its impact. Identifying personal health mentions (PHM) is the first step of social media public health surveillance. It aims to identify whether a person's health condition is mentioned in a tweet, and it serves as a crucial method in tracking pandemic conditions in real time. However, social media texts contain noise, many creative and novel phrases, sarcastic emoji expressions, and misspellings. In addition, the class imbalance issue is usually very serious. To address these challenges, we built a COVID-19 PHM dataset containing more than 11,000 annotated tweets, and we proposed a dual convolutional neural network (CNN) framework using this dataset. An auxiliary CNN in the dual CNN structure provides supplemental information for the primary CNN in order to detect PHMs from tweets more effectively. The experiment shows that the proposed structure could alleviate the effect of class imbalance and could achieve promising results. This automated approach could monitor public health in real time and save disease-prevention departments from the tedious manual work in public health surveillance.
推特提供了关于新冠病毒传播和公共卫生现状的广泛而有价值的信息。挖掘推文可能是公共卫生部门及时监测新冠病毒状况并采取适当行动以尽量减少其影响的重要补充。识别个人健康提及(PHM)是社交媒体公共卫生监测的第一步。它旨在确定一条推文中是否提及了某人的健康状况,并且它是实时追踪疫情状况的关键方法。然而,社交媒体文本包含噪音、许多富有创意和新颖的短语、讽刺性的表情符号表达以及拼写错误。此外,类别不平衡问题通常非常严重。为应对这些挑战,我们构建了一个包含超过11000条带注释推文的新冠病毒个人健康提及数据集,并使用该数据集提出了一种双卷积神经网络(CNN)框架。双CNN结构中的辅助CNN为主要CNN提供补充信息,以便更有效地从推文中检测个人健康提及。实验表明,所提出的结构可以减轻类别不平衡的影响,并能取得良好的结果。这种自动化方法可以实时监测公共卫生,使疾病预防部门从公共卫生监测中繁琐的人工工作中解脱出来。