Universidade de Pernambuco, Programa de Pós-Graduação em Engenharia da Computação, Recife, 50720-001, Brazil.
Fundação de Medicina Tropical Dr. Heitor Vieira Dourado, Manaus, 69040-000, Brazil.
Sci Data. 2022 May 10;9(1):198. doi: 10.1038/s41597-022-01312-7.
One of the main categories of Neglected Tropical Diseases (NTDs) are arboviruses, of which Dengue and Chikungunya are the most common. Arboviruses mainly affect tropical countries. Brazil has the largest absolute number of cases in Latin America. This work presents a unified data set with clinical, sociodemographic, and laboratorial data on confirmed patients of Dengue and Chikungunya, as well as patients ruled out of infection from these diseases. The data is based on case notification data submitted to the Brazilian Information System for Notifiable Diseases, from Portuguese Sistema de Informação de Agravo de Notificação (SINAN), from 2013 to 2020. The original data set comprised 13,421,230 records and 118 attributes. Following a pre-processing process, a final data set of 7,632,542 records and 56 attributes was generated. The data presented in this work will assist researchers in investigating antecedents of arbovirus emergence and transmission more generally, and Dengue and Chikungunya in particular. Furthermore, it can be used to train and test machine learning models for differential diagnosis and multi-class classification.
被忽视的热带病(NTDs)主要包括虫媒病毒,其中登革热和基孔肯雅热最为常见。虫媒病毒主要影响热带国家。巴西在拉丁美洲的病例数最多。本工作提供了一个统一的数据集,包含了确诊的登革热和基孔肯雅热患者以及排除这些疾病感染的患者的临床、社会人口学和实验室数据。数据基于提交给巴西传染病信息系统(SINAN)的病例报告数据,该系统源自葡萄牙的Agravo de Notificação 信息系统(SINAN),涵盖了 2013 年至 2020 年的数据。原始数据集包含 13,421,230 条记录和 118 个属性。经过预处理过程,生成了一个包含 7,632,542 条记录和 56 个属性的最终数据集。本工作中呈现的数据将有助于研究人员更广泛地研究虫媒病毒的出现和传播的前因,特别是登革热和基孔肯雅热。此外,它可以用于训练和测试机器学习模型,用于鉴别诊断和多类分类。