Dong Weichuan, Kim Uriel, Rose Johnie, Hoehn Richard S, Kucmanic Matthew, Eom Kirsten, Li Shu, Berger Nathan A, Koroukian Siran M
Population Cancer Analytics Shared Resource and Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
Kellogg School of Management, Northwestern University, Evanston, IL 60208, USA.
Cancers (Basel). 2023 Feb 4;15(4):1006. doi: 10.3390/cancers15041006.
The proportion of patients diagnosed with colorectal cancer (CRC) at age < 50 (early-onset CRC, or EOCRC) has steadily increased over the past three decades relative to the proportion of patients diagnosed at age ≥ 50 (late-onset CRC, or LOCRC), despite the reduction in CRC incidence overall. An important gap in the literature is whether EOCRC shares the same community-level risk factors as LOCRC. Thus, we sought to (1) identify disparities in the incidence rates of EOCRC and LOCRC using geospatial analysis and (2) compare the importance of community-level risk factors (racial/ethnic, health status, behavioral, clinical care, physical environmental, and socioeconomic status risk factors) in the prediction of EOCRC and LOCRC incidence rates using a random forest machine learning approach. The incidence data came from the Surveillance, Epidemiology, and End Results program (years 2000-2019). The geospatial analysis revealed large geographic variations in EOCRC and LOCRC incidence rates. For example, some regions had relatively low LOCRC and high EOCRC rates (e.g., Georgia and eastern Texas) while others had relatively high LOCRC and low EOCRC rates (e.g., Iowa and New Jersey). The random forest analysis revealed that the importance of community-level risk factors most predictive of EOCRC versus LOCRC incidence rates differed meaningfully. For example, diabetes prevalence was the most important risk factor in predicting EOCRC incidence rate, but it was a less important risk factor of LOCRC incidence rate; physical inactivity was the most important risk factor in predicting LOCRC incidence rate, but it was the fourth most important predictor for EOCRC incidence rate. Thus, our community-level analysis demonstrates the geographic variation in EOCRC burden and the distinctive set of risk factors most predictive of EOCRC.
在过去三十年中,相对于50岁及以上确诊的结直肠癌患者比例(晚发性结直肠癌,或LOCRC),50岁以下确诊为结直肠癌(CRC)的患者比例(早发性结直肠癌,或EOCRC)稳步上升,尽管总体CRC发病率有所下降。文献中的一个重要空白是EOCRC是否与LOCRC具有相同的社区层面风险因素。因此,我们试图(1)使用地理空间分析确定EOCRC和LOCRC发病率的差异,以及(2)使用随机森林机器学习方法比较社区层面风险因素(种族/民族、健康状况、行为、临床护理、物理环境和社会经济地位风险因素)在预测EOCRC和LOCRC发病率方面的重要性。发病率数据来自监测、流行病学和最终结果计划(2000 - 2019年)。地理空间分析显示EOCRC和LOCRC发病率存在很大的地理差异。例如,一些地区的LOCRC发病率相对较低,而EOCRC发病率较高(如佐治亚州和得克萨斯州东部),而其他地区的LOCRC发病率相对较高,EOCRC发病率较低(如爱荷华州和新泽西州)。随机森林分析显示,最能预测EOCRC与LOCRC发病率的社区层面风险因素的重要性存在显著差异。例如,糖尿病患病率是预测EOCRC发病率的最重要风险因素,但它是LOCRC发病率的一个不太重要的风险因素;缺乏身体活动是预测LOCRC发病率的最重要风险因素,但它是EOCRC发病率的第四重要预测因素。因此,我们的社区层面分析证明了EOCRC负担的地理差异以及最能预测EOCRC的独特风险因素集。