Soleimani Mohsen, Chiti Hossein
Metabolic Diseases Research Center, Health and Metabolic Diseases Research Institute, Zanjan University of Medical Sciences, Zanjan, Iran.
Sci Rep. 2025 Aug 14;15(1):29901. doi: 10.1038/s41598-025-15324-x.
The global escalation of thyroid cancer (TC) incidence, coupled with pronounced provincial and gender-based disparities in Iran, underscores an urgent public health challenge that remains underexplored through integrative analyses of environmental, socioeconomic, and healthcare factors. This study addresses this critical gap by employing an advanced multi-model machine learning (ML) framework to elucidate the spatiotemporal determinants of TC incidence across Iran's 31 provinces, offering novel insights to inform evidence-based public health strategies. Leveraging data from the Iranian National Population-based Cancer Registry (INPCR) spanning 2014-2017, we synthesized a comprehensive dataset comprising 55 variables sourced from diverse public repositories. Age-standardized incidence rates (ASRs) were meticulously computed and stratified by sex and province, followed by the application of nine ML models for feature selection including Random Forest, XG-Boost, Cat-Boost, and various regression techniques. The significance of identified predictors was rigorously validated using SHAP (SHapley Additive exPlanations) analysis across Random Forest, XG-Boost, and Cat-Boost frameworks. The analysis disclosed considerable variation in TC incidence across the population. The overall four-year ASR was 11.13 per 100,000, with females exhibiting a markedly higher rate of 35.1 per 100,000, significantly exceeding that of males at 9.6 per 100,000. Prominent predictors included Sunshine Duration (SHAP values: - 0.046 overall, 0.015 in females, - 0.005 in males), Provincial-Education-rates, Elevation in Meter, Laboratory-availability, and community Marriage-rates. Significant provincial disparities were observed in the mean ASR of TC across the entire population, notably exemplified by Yazd's elevated mean ASR of 9.2 per 100,000 in contrast to Semnan's markedly lower rate of 1.2 per 100,000 over the period 2014-2017. The ML models demonstrated moderate to robust predictive accuracy (R²: 0.21-0.86), underscoring distinct sex-specific risk profiles. This pioneering study illuminates the pivotal roles of climatic, socioeconomic, healthcare access, and environmental factors in shaping TC incidence in Iran, revealing significant regional and gender-specific variations. These findings advocate for the development of targeted public health interventions aimed at mitigating environmental exposures and rectifying healthcare disparities, thereby enhancing the precision and efficacy of TC prevention strategies.
甲状腺癌(TC)发病率在全球范围内不断攀升,伊朗国内各省之间以及不同性别之间存在显著差异,这凸显了一个紧迫的公共卫生挑战,而通过对环境、社会经济和医疗因素进行综合分析,这一挑战仍未得到充分探索。本研究通过采用先进的多模型机器学习(ML)框架来解决这一关键差距,以阐明伊朗31个省份TC发病率的时空决定因素,为基于证据的公共卫生策略提供新的见解。利用伊朗国家人口癌症登记处(INPCR)2014 - 2017年的数据,我们合成了一个包含55个变量的综合数据集,这些变量来自不同的公共数据库。精心计算年龄标准化发病率(ASR),并按性别和省份进行分层,随后应用九种ML模型进行特征选择,包括随机森林、XG - Boost、Cat - Boost以及各种回归技术。使用SHAP(Shapley加性解释)分析在随机森林、XG - Boost和Cat - Boost框架下对已识别预测因子的重要性进行了严格验证。分析揭示了整个人口中TC发病率存在显著差异。四年总体ASR为每10万人11.13例,女性发病率明显更高,为每10万人35.1例,显著超过男性的每10万人9.6例。突出的预测因子包括日照时长(SHAP值:总体为 - 0.046,女性为0.015,男性为 - 0.005)、省级教育率、海拔高度(米)、实验室可及性以及社区结婚率。在整个人口中,观察到TC的平均ASR存在显著的省份差异,例如,在2014 - 2017年期间,亚兹德的平均ASR较高,为每10万人9.2例,而塞姆南的发病率则明显较低,为每10万人1.2例。ML模型显示出中等至较强的预测准确性(R²:0.21 - 0.86),突出了不同的性别特异性风险特征。这项开创性研究阐明了气候、社会经济、医疗可及性和环境因素在塑造伊朗TC发病率方面的关键作用,揭示了显著的区域和性别特异性差异。这些发现倡导制定有针对性的公共卫生干预措施,旨在减少环境暴露并纠正医疗差距,从而提高TC预防策略的精准性和有效性。