Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China.
Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China.
BMC Cancer. 2018 Nov 8;18(1):1084. doi: 10.1186/s12885-018-4985-2.
An increasing number of studies have identified spatial differences in colorectal cancer survival. However, little is known about the spatially varying effects of predictors in survival prediction modeling studies of colorectal cancer that have focused on estimating the absolute survival risk for patients from a wide range of populations. This study aimed to demonstrate the spatially varying effects of predictors of survival for nonmetastatic colorectal cancer patients.
Patients diagnosed with nonmetastatic colorectal cancer from 2004 to 2013 who were followed up through the end of 2013 were extracted from the Surveillance Epidemiology End Results registry (Patients: 128061). The log-rank test and the restricted mean survival time were used to evaluate survival outcome differences among spatial clusters corresponding to a widely used clinical predictor: stage determined by AJCC 7th edition staging system. The heterogeneity test, which is used in meta-analyses, revealed the spatially varying effects of single predictors. Then, considering the above predictors in a standard survival prediction model based on spatially clustered data, the spatially varying coefficients of these models revealed that some covariate effects may not be constant across the geographic regions of the study. Then, two types of survival prediction models (a statistical model and a machine learning model) were built; these models considered the predictors and enabled survival prediction for patients from a wide range of geographic regions.
Based on univariate and multivariate analysis, some prognostic factors, such as "TNM stage", "tumor size" and "age at diagnosis," have significant spatially varying effects among different regions. When considering these spatially varying effects, machine learning models have fewer assumption constraints (such as proportional hazard assumptions) and better predictive performance compared with statistical models. Upon comparing the concordance indexes of these two models, the machine learning model was found to be more accurate (0.898[0.895,0.902]) than the statistical model (0.732 [0.726, 0.738]).
Based on this study, it's recommended that the spatially varying effect of predictors should be considered when building survival prediction models involving large-scale and multicenter research data. Machine learning models that are not limited by the requirement of a statistical hypothesis are promising alternative models.
越来越多的研究表明结直肠癌的生存存在空间差异。然而,对于聚焦于为来自广泛人群的患者估计绝对生存风险的结直肠癌生存预测建模研究,关于预测因子的空间变化效应知之甚少。本研究旨在展示非转移性结直肠癌患者生存预测中预测因子的空间变化效应。
从 Surveillance Epidemiology End Results 登记处(患者:128061)中提取 2004 年至 2013 年期间诊断为非转移性结直肠癌且随访至 2013 年底的患者。使用对数秩检验和限制平均生存时间来评估对应于广泛使用的临床预测因子(第 7 版 AJCC 分期系统确定的分期)的空间聚类的生存结果差异。在荟萃分析中使用的异质性检验揭示了单预测因子的空间变化效应。然后,在基于空间聚类数据的标准生存预测模型中考虑上述预测因子,这些模型的空间变化系数揭示了一些协变量效应可能在研究的地理区域内不是恒定的。然后,构建了两种类型的生存预测模型(统计模型和机器学习模型);这些模型考虑了预测因子,并能够为来自广泛地理区域的患者进行生存预测。
基于单变量和多变量分析,一些预后因素,如“TNM 分期”、“肿瘤大小”和“诊断时年龄”,在不同区域之间具有显著的空间变化效应。在考虑这些空间变化效应时,机器学习模型具有较少的假设约束(例如比例风险假设),并且与统计模型相比具有更好的预测性能。通过比较这两种模型的一致性指数,发现机器学习模型更准确(0.898[0.895,0.902]),而统计模型(0.732[0.726,0.738])。
基于本研究,建议在构建涉及大规模和多中心研究数据的生存预测模型时应考虑预测因子的空间变化效应。不受统计假设要求限制的机器学习模型是很有前途的替代模型。