Davoudi Anahita, Klein Ari Z, Sarker Abeed, Gonzalez-Hernandez Graciela
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:136-141. eCollection 2020.
With the increasing use of social media data for health-related research, the credibility of the information from this source has been questioned as the posts may not from originating personal accounts. While automatic bot detection approaches have been proposed, none have been evaluated on users posting health-related information. In this paper, we extend an existing bot detection system and customize it for health-related research. Using a dataset of Twitter users, we first show that the system, which was designed for political bot detection, underperforms when applied to health-related Twitter users. We then incorporate additional features and a statistical machine learning classifier to improve bot detection performance significantly. Our approach obtains F1-scores of 0.7 for the "bot" class, representing improvements of 0.339. Our approach is customizable and generalizable for bot detection in other health-related social media cohorts.
随着社交媒体数据在健康相关研究中的使用日益增加,由于这些帖子可能并非来自个人账号,因此该来源信息的可信度受到了质疑。虽然已经提出了自动机器人检测方法,但尚未对发布健康相关信息的用户进行评估。在本文中,我们扩展了现有的机器人检测系统,并针对健康相关研究对其进行了定制。使用Twitter用户数据集,我们首先表明,该为政治机器人检测而设计的系统在应用于与健康相关的Twitter用户时表现不佳。然后,我们纳入了额外的特征和统计机器学习分类器,以显著提高机器人检测性能。我们的方法在“机器人”类别上获得了0.7的F1分数,提高了0.339。我们的方法可定制且可推广,适用于其他与健康相关的社交媒体群组中的机器人检测。