Liu Lingbo, Wang Fahui, Onega Tracy
Center for Geographic Analysis, Harvard University, Cambridge, MA, USA.
Department of Geography and Anthropology, Louisiana State University, Baton Rouge, LA, USA.
Sci Data. 2025 May 30;12(1):909. doi: 10.1038/s41597-025-05254-8.
High-quality cancer data are fundamental for public health research and policy, but cancer data for small geographic units and population subgroups in the United States are rarely available due to small-sample suppression rules, spatial coarsening, and data incompleteness. These limitations hinder high-resolution spatial analyses and precision public health interventions. This study provides a high-resolution cancer incidence dataset for the U.S., generated through a multi-constraint Monte Carlo simulation framework that reconstructs suppressed county-level cancer data and systematically disaggregates them to ZIP Code Tabulation Areas (ZCTAs), guided by demographic constraints. This method integrates population subgroup structures and macro-level incidence rates as constraints, ensuring consistency and reliability across spatial scales. The resulting dataset spans multiple geographic units, from state and county levels to ZCTAs, enabling comprehensive analyses of cancer burden, in-depth spatial analyses, and precision public health interventions across multiple scales.
高质量的癌症数据是公共卫生研究和政策的基础,但由于小样本抑制规则、空间粗化和数据不完整,美国小地理区域和人口亚组的癌症数据很少可得。这些限制阻碍了高分辨率空间分析和精准公共卫生干预。本研究提供了一个美国的高分辨率癌症发病率数据集,该数据集通过多约束蒙特卡罗模拟框架生成,该框架重建了被抑制的县级癌症数据,并在人口统计学约束的指导下将其系统地分解为邮政编码分区(ZCTA)。该方法将人口亚组结构和宏观层面的发病率作为约束条件,确保跨空间尺度的一致性和可靠性。所得数据集跨越多个地理单元,从州和县级到ZCTA,能够对癌症负担进行全面分析、进行深入的空间分析以及跨多个尺度的精准公共卫生干预。