University of Minnesota, Minneapolis - Minneapolis (MN), United States.
Universidade Federal Fluminense - Niterói (RJ), Brazil.
Rev Bras Epidemiol. 2024 May 13;27:e240024. doi: 10.1590/1980-549720240024. eCollection 2024.
Tuberculosis (TB) is the second most deadly infectious disease globally, posing a significant burden in Brazil and its Amazonian region. This study focused on the "riverine municipalities" and hypothesizes the presence of TB clusters in the area. We also aimed to train a machine learning model to differentiate municipalities classified as hot spots vs. non-hot spots using disease surveillance variables as predictors.
Data regarding the incidence of TB from 2019 to 2022 in the riverine town was collected from the Brazilian Health Ministry Informatics Department. Moran's I was used to assess global spatial autocorrelation, while the Getis-Ord GI* method was employed to detect high and low-incidence clusters. A Random Forest machine-learning model was trained using surveillance variables related to TB cases to predict hot spots among non-hot spot municipalities.
Our analysis revealed distinct geographical clusters with high and low TB incidence following a west-to-east distribution pattern. The Random Forest Classification model utilizes six surveillance variables to predict hot vs. non-hot spots. The machine learning model achieved an Area Under the Receiver Operator Curve (AUC-ROC) of 0.81.
Municipalities with higher percentages of recurrent cases, deaths due to TB, antibiotic regimen changes, percentage of new cases, and cases with smoking history were the best predictors of hot spots. This prediction method can be leveraged to identify the municipalities at the highest risk of being hot spots for the disease, aiding policymakers with an evidenced-based tool to direct resource allocation for disease control in the riverine municipalities.
结核病(TB)是全球第二大致命传染病,在巴西及其亚马逊地区造成了巨大负担。本研究聚焦于“沿江城市”,并假设该地区存在结核病聚集。我们还旨在训练一个机器学习模型,使用疾病监测变量作为预测因子,区分被归类为热点和非热点的城市。
从巴西卫生部信息部门收集了 2019 年至 2022 年沿江城镇的结核病发病率数据。使用 Moran's I 评估全球空间自相关,而 Getis-Ord GI* 方法用于检测高和低发病率的聚集。使用与结核病病例相关的监测变量训练随机森林机器学习模型,以预测非热点城市中的热点。
我们的分析揭示了具有高和低结核病发病率的明显地理聚集,呈现出从西到东的分布模式。随机森林分类模型利用六个监测变量来预测热点与非热点城市。该机器学习模型的接收者操作特征曲线下面积(AUC-ROC)为 0.81。
复发病例百分比较高、因结核病死亡、抗生素方案改变、新病例百分比以及有吸烟史的病例是预测热点的最佳指标。这种预测方法可用于识别疾病风险最高的热点城市,为决策者提供一个基于证据的工具,以指导在沿江城市的疾病控制资源分配。