Oswaldo Cruz Foundation (Fiocruz), Rio de Janeiro, Brazil.
Military Institute of Engineering (IME), Rio de Janeiro, Brazil.
Sci Data. 2022 Aug 10;9(1):489. doi: 10.1038/s41597-022-01581-2.
The lack of georeferencing in geospatial datasets hinders the accomplishment of scientific studies that rely on accurate data. This is particularly concerning in the field of health sciences, where georeferenced data could lead to scientific results of great relevance to society. The Brazilian health systems, especially those for Notifiable Diseases, in practice do not register georeferenced data; instead, the records indicate merely the municipality in which the event occurred. Typically in data-driven modeling, accurate disease prediction models based on occurrence requires socioenvironmental characteristics of the exact location of each event, which is often unavailable. To enrich the expressiveness of data-driven models when the municipality of the event is the best available information, we produced datasets with statistical characterization of all 5,570 Brazilian municipalities in 642 layers of thematic data that represent the natural and artificial characteristics of the municipalities' landscapes over time. This resulted in a collection of datasets comprising a total of 11,556 descriptive statistics attributes for each municipality.
地理空间数据集缺乏地理参考信息,这阻碍了依赖准确数据的科学研究的完成。在健康科学领域,这尤其令人担忧,因为地理参考数据可以带来对社会具有重要意义的科学成果。巴西的卫生系统,特别是那些针对法定传染病的系统,实际上并没有记录地理参考数据;相反,记录只表明了事件发生的直辖市。通常在基于数据的建模中,基于发生的准确疾病预测模型需要每个事件的确切位置的社会环境特征,而这些特征往往是不可用的。为了在事件发生的直辖市是最可用信息的情况下丰富数据驱动模型的表达能力,我们生成了数据集,这些数据集具有对巴西所有 5570 个直辖市的统计特征描述,共包含 642 层专题数据,这些数据代表了直辖市景观的自然和人工特征随时间的变化。这产生了一个数据集集合,其中包含每个直辖市的总共 11556 个描述性统计属性。