Jahanbin Kia, Rahmanian Fereshte, Rahmanian Vahid, Jahromi Abdolreza Sotoodeh
Research Center for social determinants of health, Jahrom University of Medical Sciences, Jahrom, Iran.
Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom, Iran.
GMS Hyg Infect Control. 2019 Dec 2;14:Doc19. doi: 10.3205/dgkh000334. eCollection 2019.
With the advancements of communication technology and growing access to social networks, these networks now play an important role in the dissemination of information and news without going through the time-consuming channels of official news networks. Analysis of social networking data is a new, interesting branch of text mining science. This study aimed to develop a text mining technique for extracting information about infectious diseases from tweets and news on social media. A method called "Fuzzy Algorithm for Extraction, Monitoring, and Classification of Infectious Diseases" (FAEMC-ID) was developed by the use of fuzzy modeling of the Takagi-Sugeno-Kang type. In addition to the real-time classification, the method is able to update its vocabulary for new keywords and visualize the classified data on the world map to mark the high risk areas. As an example, the monitoring was performed for measles-related news items over a 183-hour period from 01/03/2019 (01:00 am) to 08/03/2019 (12:00 pm), which were related to 2,870 tweets from 2,556 users. This monitoring showed that the number of tweets posted from each region ranged from 1 to 47, with the highest number, 47 tweets, belonging to Canada. The origins of most measles-related news were in the Americas and Europe, and they were mostly from the United States and Canada. The performance analysis of the developed method in comparison with other algorithms in the literature demonstrated the excellent precision of the method with a recall ratio of 88.41% and the high inter-correlation of data in each class. The proposed algorithm can also be used in the development of more effective monitoring and tracking systems for other human and even animal health hazards.
随着通信技术的进步以及社交网络接入的增加,这些网络如今在信息和新闻传播中发挥着重要作用,无需通过官方新闻网络耗时的渠道。社交网络数据分析是文本挖掘科学中一个新的、有趣的分支。本研究旨在开发一种文本挖掘技术,用于从社交媒体上的推文和新闻中提取传染病信息。通过使用高木-菅野-康类型的模糊建模,开发了一种名为“传染病提取、监测和分类模糊算法”(FAEMC-ID)的方法。除了实时分类外,该方法还能够更新其词汇表以包含新关键词,并在世界地图上可视化分类数据以标记高风险区域。例如,在2019年3月1日凌晨1点至2019年3月8日下午12点的183小时内,对与麻疹相关的新闻条目进行了监测,这些新闻与来自2556名用户的2870条推文相关。该监测表明,每个地区发布的推文数量从1条到47条不等,其中数量最多的47条推文来自加拿大。大多数与麻疹相关的新闻起源于美洲和欧洲,主要来自美国和加拿大。与文献中其他算法相比,所开发方法的性能分析表明该方法具有出色的精度,召回率为88.41%,且每个类别中的数据具有高度相关性。所提出的算法还可用于开发针对其他人类甚至动物健康危害的更有效的监测和跟踪系统。