Montgomery Aisha, Vadapalli Ravi, Dinenno Frank A, Schilling Josh, Jain Praduman, Jacob Aasems, Chism David, Shanker Anil
Vibrent Health, 4114 Legato Rd #900, Fairfax, VA, 22033, USA.
Applied Sciences, Premier, Inc., Charlotte, NC, United States.
Sci Rep. 2025 Jul 16;15(1):25781. doi: 10.1038/s41598-025-11074-y.
Colorectal cancer (CRC) is the 2nd leading cause of cancer death in the United States (US). Rural Appalachia suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality. This study describes novel predictive modeling that uses demographic, clinical, and SDOH features from health records data from Appalachian community cancer centers to predict 5-year CRC survival. We trained, validated, and tested four gradient-boosted tree ensemble (XGBoost) machine learning models which were developed using selected combinations of available features. The area under the receiver operating characteristic curve was greatest in the model that included SDOH features with demographic and clinical features (0.79; P < 0.0001). Feature stratification showed rurality as the top SDOH feature. It is demonstrated that the ML model performs better when SDOH features are included, and that rurality significantly impacts CRC survival in Appalachia. The study provides preliminary indications that further data collection and evaluation of SDOH factors would strengthen our understanding of their impact on cancer survival in Appalachia and other underserved populations and improve development of strategies for care delivery.
结直肠癌(CRC)是美国癌症死亡的第二大主要原因。阿巴拉契亚农村地区的CRC发病率和死亡率最高。有几个与癌症死亡率相关的非临床健康相关社会决定因素(SDOH)。本研究描述了一种新颖的预测模型,该模型使用阿巴拉契亚社区癌症中心健康记录数据中的人口统计学、临床和SDOH特征来预测CRC的5年生存率。我们训练、验证并测试了四个梯度提升树集成(XGBoost)机器学习模型,这些模型是使用可用特征的选定组合开发的。在包含SDOH特征以及人口统计学和临床特征的模型中,受试者工作特征曲线下面积最大(0.79;P < 0.0001)。特征分层显示农村地区是首要的SDOH特征。结果表明,纳入SDOH特征时机器学习模型表现更好,并且农村地区对阿巴拉契亚地区的CRC生存有显著影响。该研究提供了初步迹象,表明进一步收集和评估SDOH因素将加强我们对其对阿巴拉契亚地区和其他服务不足人群癌症生存影响的理解,并改善护理提供策略的制定。