School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Scottsville, South Africa.
Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, Sudan.
PLoS One. 2021 Dec 29;16(12):e0261625. doi: 10.1371/journal.pone.0261625. eCollection 2021.
Understanding and identifying the markers and clinical information that are associated with colorectal cancer (CRC) patient survival is needed for early detection and diagnosis. In this work, we aimed to build a simple model using Cox proportional hazards (PH) and random survival forest (RSF) and find a robust signature for predicting CRC overall survival. We used stepwise regression to develop Cox PH model to analyse 54 common differentially expressed genes from three mutations. RSF is applied using log-rank and log-rank-score based on 5000 survival trees, and therefore, variables important obtained to find the genes that are most influential for CRC survival. We compared the predictive performance of the Cox PH model and RSF for early CRC detection and diagnosis. The results indicate that SLC9A8, IER5, ARSJ, ANKRD27, and PIPOX genes were significantly associated with the CRC overall survival. In addition, age, sex, and stages are also affecting the CRC overall survival. The RSF model using log-rank is better than log-rank-score, while log-rank-score needed more trees to stabilize. Overall, the imputation of missing values enhanced the model's predictive performance. In addition, Cox PH predictive performance was better than RSF.
了解和识别与结直肠癌(CRC)患者生存相关的标志物和临床信息,对于早期检测和诊断至关重要。在这项工作中,我们旨在使用 Cox 比例风险(PH)和随机生存森林(RSF)构建一个简单的模型,并找到用于预测 CRC 总生存的稳健特征。我们使用逐步回归方法来开发 Cox PH 模型,以分析来自三种突变的 54 个常见差异表达基因。RSF 使用基于 5000 棵生存树的对数秩和对数秩评分进行应用,因此可以获得重要的变量来寻找对 CRC 生存最有影响的基因。我们比较了 Cox PH 模型和 RSF 用于早期 CRC 检测和诊断的预测性能。结果表明,SLC9A8、IER5、ARSJ、ANKRD27 和 PIPOX 基因与 CRC 总生存显著相关。此外,年龄、性别和分期也会影响 CRC 的总生存。使用对数秩的 RSF 模型比对数秩评分要好,而对数秩评分需要更多的树来稳定。总体而言,缺失值的插补提高了模型的预测性能。此外,Cox PH 的预测性能优于 RSF。