Institute for Tropical Biology and Conservation, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu, Sabah 88400, Malaysia.
School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia.
Acta Trop. 2022 Jul;231:106447. doi: 10.1016/j.actatropica.2022.106447. Epub 2022 Apr 14.
Mosquito-borne diseases are emerging and re-emerging across the globe, especially after the COVID19 pandemic. The recent advances in text mining in infectious diseases hold the potential of providing timely access to explicit and implicit associations among information in the text. In the past few years, the availability of online text data in the form of unstructured or semi-structured text with rich content of information from this domain enables many studies to provide solutions in this area, e.g., disease-related knowledge discovery, disease surveillance, early detection system, etc. However, a recent review of text mining in the domain of mosquito-borne disease was not available to the best of our knowledge. In this review, we survey the recent works in the text mining techniques used in combating mosquito-borne diseases. We highlight the corpus sources, technologies, applications, and the challenges faced by the studies, followed by the possible future directions that can be taken further in this domain. We present a bibliometric analysis of the 294 scientific articles that have been published in Scopus and PubMed in the domain of text mining in mosquito-borne diseases, from the year 2016 to 2021. The papers were further filtered and reviewed based on the techniques used to analyze the text related to mosquito-borne diseases. Based on the corpus of 158 selected articles, we found 27 of the articles were relevant and used text mining in mosquito-borne diseases. These articles covered the majority of Zika (38.70%), Dengue (32.26%), and Malaria (29.03%), with extremely low numbers or none of the other crucial mosquito-borne diseases like chikungunya, yellow fever, West Nile fever. Twitter was the dominant corpus resource to perform text mining in mosquito-borne diseases, followed by PubMed and LexisNexis databases. Sentiment analysis was the most popular technique of text mining to understand the discourse of the disease and followed by information extraction, which dependency relation and co-occurrence-based approach to extract relations and events. Surveillance was the main usage of most of the reviewed studies and followed by treatment, which focused on the drug-disease or symptom-disease association. The advance in text mining could improve the management of mosquito-borne diseases. However, the technique and application posed many limitations and challenges, including biases like user authentication and language, real-world implementation, etc. We discussed the future direction which can be useful to expand this area and domain. This review paper contributes mainly as a library for text mining in mosquito-borne diseases and could further explore the system for other neglected diseases.
蚊媒传染病在全球范围内不断出现和再现,尤其是在 COVID19 大流行之后。传染病领域文本挖掘的最新进展有可能提供对文本信息中隐含和显式关联的及时访问。在过去的几年中,以非结构化或半结构化形式提供的在线文本数据形式具有丰富的信息内容,使许多研究能够在该领域提供解决方案,例如疾病相关知识发现、疾病监测、早期检测系统等。然而,据我们所知,最近没有关于蚊媒传染病领域文本挖掘的综述。在本综述中,我们调查了用于对抗蚊媒传染病的文本挖掘技术的最新工作。我们重点介绍了语料库来源、技术、应用以及研究面临的挑战,然后介绍了在该领域可以进一步采取的可能的未来方向。我们对 2016 年至 2021 年在蚊媒传染病领域的 Scopus 和 PubMed 中发表的 294 篇科学文章进行了文献计量分析。根据与蚊媒传染病相关的文本分析所使用的技术,进一步筛选和审查了这些论文。基于 158 篇选定文章的语料库,我们发现其中 27 篇与蚊媒传染病的文本挖掘相关。这些文章涵盖了 Zika(38.70%)、登革热(32.26%)和疟疾(29.03%)的大部分内容,而其他重要的蚊媒传染病(如基孔肯雅热、黄热病、西尼罗河热)的数量极少或根本没有。在蚊媒传染病中进行文本挖掘的主要语料库资源是 Twitter,其次是 PubMed 和 LexisNexis 数据库。情感分析是文本挖掘中最流行的技术,用于理解疾病的论述,其次是信息提取,它依赖关系和共现的方法来提取关系和事件。监测是大多数综述研究的主要用途,其次是治疗,重点是药物-疾病或症状-疾病的关联。文本挖掘的进展可以改善蚊媒传染病的管理。然而,该技术和应用存在许多限制和挑战,包括用户认证和语言等偏见、实际实施等。我们讨论了可以扩展该领域和领域的未来方向。本文综述主要作为蚊媒传染病文本挖掘的文献库,也可以进一步探索其他被忽视疾病的系统。