Department of Epidemiology, Faculty of Public Health, University of São Paulo, São Paulo, Brazil.
Int J Public Health. 2023 Jul 20;68:1604789. doi: 10.3389/ijph.2023.1604789. eCollection 2023.
Our aim was to test if machine learning algorithms can predict cancer mortality (CM) at an ecological level and use these results to identify statistically significant spatial clusters of excess cancer mortality (eCM). Age-standardized CM was extracted from the official databases of Brazil. Predictive features included sociodemographic and health coverage variables. Machine learning algorithms were selected and trained with 70% of the data, and the performance was tested with the remaining 30%. Clusters of eCM were identified using SatScan. Additionally, separate analyses were performed for the 10 most frequent cancer types. The gradient boosting trees algorithm presented the highest coefficient of determination ( = 0.66). For total cancer, all algorithms overlapped in the region of Bagé (27% eCM). For esophageal cancer, all algorithms overlapped in west Rio Grande do Sul (48%-96% eCM). The most significant cluster for stomach cancer was in Macapá (82% eCM). The most important variables were the percentage of the white population and residents with computers. We found consistent and well-defined geographic regions in Brazil with significantly higher than expected cancer mortality.
我们的目的是测试机器学习算法是否可以在生态水平上预测癌症死亡率(CM),并利用这些结果来识别癌症死亡率过高的具有统计学意义的空间聚类(eCM)。年龄标准化的 CM 是从巴西官方数据库中提取的。预测特征包括社会人口统计学和健康覆盖变量。使用 70%的数据选择和训练机器学习算法,并用剩余的 30%的数据测试性能。使用 SaTScan 识别 eCM 的聚类。此外,还针对最常见的 10 种癌症类型进行了单独分析。梯度提升树算法表现出最高的决定系数(R²=0.66)。对于所有癌症,所有算法在巴伊亚州(27%的 eCM)的地区重叠。对于食管癌,所有算法在南里奥格兰德州西部(48%-96%的 eCM)重叠。在马卡帕,胃癌的最重要聚类(82%的 eCM)。最重要的变量是白种人口比例和拥有计算机的居民比例。 我们在巴西发现了一致且明确的地理区域,这些区域的癌症死亡率明显高于预期。