Yao Sijie, Cao Biwei, Li Tingyi, Kalos Denise, Yuan Yading, Wang Xuefeng
Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA.
Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA.
NAR Genom Bioinform. 2023 Jun 16;5(2):lqad055. doi: 10.1093/nargab/lqad055. eCollection 2023 Jun.
Identifying novel and reliable prognostic biomarkers for predicting patient survival outcomes is essential for deciding personalized treatment strategies for diseases such as cancer. Numerous feature selection techniques have been proposed to address the high-dimensional problem in constructing prediction models. Not only does feature selection lower the data dimension, but it also improves the prediction accuracy of the resulted models by mitigating overfitting. The performances of these feature selection methods when applied to survival models, on the other hand, deserve further investigation. In this paper, we construct and compare a series of prediction-oriented biomarker selection frameworks by leveraging recent machine learning algorithms, including random survival forests, extreme gradient boosting, light gradient boosting and deep learning-based survival models. Additionally, we adapt the recently proposed prediction-oriented marker selection (PROMISE) to a survival model (PROMISE-Cox) as a benchmark approach. Our simulation studies indicate that boosting-based approaches tend to provide superior accuracy with better true positive rate and false positive rate in more complicated scenarios. For demonstration purpose, we applied the proposed biomarker selection strategies to identify prognostic biomarkers in different modalities of head and neck cancer data.
识别用于预测患者生存结果的新型可靠预后生物标志物对于确定癌症等疾病的个性化治疗策略至关重要。为解决构建预测模型中的高维问题,已提出了许多特征选择技术。特征选择不仅降低了数据维度,还通过减轻过拟合提高了所得模型的预测准确性。另一方面,这些特征选择方法应用于生存模型时的性能值得进一步研究。在本文中,我们利用包括随机生存森林、极端梯度提升、轻梯度提升和基于深度学习的生存模型等最新机器学习算法,构建并比较了一系列面向预测的生物标志物选择框架。此外,我们将最近提出的面向预测的标志物选择(PROMISE)应用于生存模型(PROMISE-Cox)作为基准方法。我们的模拟研究表明,在更复杂的情况下,基于提升的方法往往能提供更高的准确性,具有更好的真阳性率和假阳性率。为作演示,我们应用所提出的生物标志物选择策略在不同模态的头颈癌数据中识别预后生物标志物。