具有肿瘤特征分析的套索惩罚 Cox 模型的预后可提高预测准确性,优于仅使用临床数据的预测,并且受益于二维预筛选。
Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening.
机构信息
IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France.
GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France.
出版信息
BMC Cancer. 2022 Oct 5;22(1):1045. doi: 10.1186/s12885-022-10117-1.
BACKGROUND
Prediction of patient survival from tumor molecular '-omics' data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of "high dimension", as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events. Thus, pre-screening together with penalization methods are widely used for dimensional reduction.
METHODS
In the present paper, (i) we benchmark the performance of the lasso penalization and three variants (i.e., ridge, elastic net, adaptive elastic net) on 16 cancers from TCGA after pre-screening, (ii) we propose a bi-dimensional pre-screening procedure based on both gene variability and p-values from single variable Cox models to predict survival, and (iii) we compare our results with iterative sure independence screening (ISIS).
RESULTS
First, we show that integration of mRNA-seq data with clinical data improves predictions over clinical data alone. Second, our bi-dimensional pre-screening procedure can only improve, in moderation, the C-index and/or the integrated Brier score, while excluding irrelevant genes for prediction. We demonstrate that the different penalization methods reached comparable prediction performances, with slight differences among datasets. Finally, we provide advice in the case of multi-omics data integration.
CONCLUSIONS
Tumor profiles convey more prognostic information than clinical variables such as stage for many cancer subtypes. Lasso and Ridge penalizations perform similarly than Elastic Net penalizations for Cox models in high-dimension. Pre-screening of the top 200 genes in term of single variable Cox model p-values is a practical way to reduce dimension, which may be particularly useful when integrating multi-omics.
背景
从肿瘤分子“组学”数据预测患者的生存情况是迈向个体化医疗的关键一步。基于 RNA 谱数据集的 Cox 模型常用于临床结局预测。但是,这些模型是在“高维”背景下应用的,因为协变量(基因表达)的数量 p 远远超过患者数量 n 和事件数量 e。因此,预筛选和惩罚方法被广泛用于降维。
方法
在本文中,(i)我们在 TCGA 的 16 种癌症中,对 Lasso 惩罚和三种变体(岭回归、弹性网络、自适应弹性网络)在预筛选后的表现进行了基准测试,(ii)我们提出了一种基于基因变异性和单变量 Cox 模型的 p 值的二维预筛选程序,用于预测生存,(iii)我们将结果与迭代确定性筛选(ISIS)进行了比较。
结果
首先,我们表明,将 mRNA-seq 数据与临床数据集成可以提高临床数据单独预测的准确性。其次,我们的二维预筛选程序只能适度提高 C 指数和/或综合 Brier 得分,同时排除与预测无关的基因。我们证明了不同的惩罚方法达到了类似的预测性能,不同数据集之间存在细微差异。最后,我们在多组学数据集成的情况下提供了建议。
结论
对于许多癌症亚型,肿瘤图谱比临床变量(如分期)传递更多的预后信息。在高维环境中,Lasso 和 Ridge 惩罚与弹性网络惩罚在 Cox 模型中的表现相似。基于单变量 Cox 模型 p 值筛选前 200 个基因是一种实用的降维方法,在整合多组学数据时可能特别有用。