Li Ruidong, Zhu Jianguo, Zhong Wei-De, Jia Zhenyu
Department of Botany and Plant Sciences, University of California, Riverside, California.
Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, California.
Cancer Res. 2022 May 3;82(9):1832-1843. doi: 10.1158/0008-5472.CAN-21-3074.
Overtreatment remains a pervasive problem in prostate cancer management due to the highly variable and often indolent course of disease. Molecular signatures derived from gene expression profiling have played critical roles in guiding prostate cancer treatment decisions. Many gene expression signatures have been developed to improve the risk stratification of prostate cancer and some of them have already been applied to clinical practice. However, no comprehensive evaluation has been performed to compare the performance of these signatures. In this study, we conducted a systematic and unbiased evaluation of 15 machine learning (ML) algorithms and 30 published prostate cancer gene expression-based prognostic signatures leveraging 10 transcriptomics datasets with 1,558 primary patients with prostate cancer from public data repositories. This analysis revealed that survival analysis models outperformed binary classification models for risk assessment, and the performance of the survival analysis methods-Cox model regularized with ridge penalty (Cox-Ridge) and partial least squares (PLS) regression for Cox model (Cox-PLS)-were generally more robust than the other methods. Based on the Cox-Ridge algorithm, several top prognostic signatures displayed comparable or even better performance than commercial panels. These findings will facilitate the identification of existing prognostic signatures that are promising for further validation in prospective studies and promote the development of robust prognostic models to guide clinical decision-making. Moreover, this study provides a valuable data resource from large primary prostate cancer cohorts, which can be used to develop, validate, and evaluate novel statistical methodologies and molecular signatures to improve prostate cancer management.
This systematic evaluation of 15 machine learning algorithms and 30 published gene expression signatures for the prognosis of prostate cancer will assist clinical decision-making.
由于前列腺癌病程高度可变且通常进展缓慢,过度治疗仍是前列腺癌管理中普遍存在的问题。基因表达谱衍生的分子特征在指导前列腺癌治疗决策中发挥了关键作用。许多基因表达特征已被开发出来以改善前列腺癌的风险分层,其中一些已应用于临床实践。然而,尚未对这些特征的性能进行全面评估。在本研究中,我们利用来自公共数据存储库的10个转录组学数据集,对15种机器学习(ML)算法和30个已发表的基于前列腺癌基因表达的预后特征进行了系统且无偏倚的评估,这些数据集包含1558例原发性前列腺癌患者。该分析表明,生存分析模型在风险评估方面优于二元分类模型,并且生存分析方法——采用岭罚则正则化的Cox模型(Cox - Ridge)和用于Cox模型的偏最小二乘(PLS)回归(Cox - PLS)——的性能通常比其他方法更稳健。基于Cox - Ridge算法,一些顶级预后特征显示出与商业检测板相当甚至更好的性能。这些发现将有助于识别现有预后特征,这些特征有望在前瞻性研究中进一步验证,并促进稳健预后模型的开发以指导临床决策。此外,本研究提供了来自大型原发性前列腺癌队列的宝贵数据资源,可用于开发、验证和评估新的统计方法和分子特征,以改善前列腺癌管理。
对15种机器学习算法和30个已发表的前列腺癌预后基因表达特征进行的系统评估将有助于临床决策。