高维Cox模型：作为模型构建过程一部分的惩罚项选择

High-dimensional Cox models: the choice of penalty as part of the model building process.

作者信息

Benner Axel, Zucknick Manuela, Hielscher Thomas, Ittrich Carina, Mansmann Ulrich

机构信息

Division of Biostatistics, German Cancer Research Center, Heidelberg, Germany.

出版信息

Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.

DOI:10.1002/bimj.200900064

PMID:20166132

Abstract

The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high-dimensional models where the number of covariates is much larger than the number of observations (p>>n) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L(1)-penalized Cox regression using the lasso (Tibshirani (1997). Statistics in Medicine 16, 385-395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li (2001). Journal of the American Statistical Association 96, 1348-1360; Fan and Li (2002). The Annals of Statistics 30, 74-99). The purpose of this article is to implement them practically into the model building process when analyzing high-dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou (2006). Journal of the American Statistical Association 101, 1418-1429). We compare them with "standard" applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research.

摘要

Cox比例风险回归模型是用于对生存时间的协变量信息进行建模的最常用方法。在这种情况下，开发协变量数量远大于观测值数量（p>>n）的高维模型是一个持续存在的挑战。一种可行的方法是在这种情况下使用岭惩罚Cox回归。除了专注于找到最佳预测规则外，人们通常还对确定对预后最重要的协变量子集感兴趣。这在微阵列数据的生物统计分析中可能是一个基因集。例如，协变量选择可以通过使用套索的L(1)惩罚Cox回归来完成（Tibshirani（1997年）。《医学统计学》16，385 - 395）。近年来已经开发了几种超越套索的方法，这些方法纳入了协变量选择。这包括套索的修改以及非凸变体，如平滑截断绝对偏差（SCAD）（Fan和Li（2001年）。《美国统计协会杂志》96，1348 - 1360；Fan和Li（2002年）。《统计学年鉴》30，74 - 99）。本文的目的是在使用Cox比例风险模型分析高维数据时，将它们实际应用到模型构建过程中。为了评估超越套索的惩罚回归模型，我们纳入了SCAD变体和自适应套索（Zou（2006年）。《美国统计协会杂志》101，1418 - 1429）。我们将它们与“标准”应用进行比较，如岭回归、套索和弹性网络。将研究预测准确性、变量选择特征和估计偏差，以评估这些方法的实际应用。我们观察到SCAD和自适应套索的性能高度依赖于非平凡的预选择程序。目前还不存在解决这个问题的实际方案。由于在不适当的初始选择步骤之后使用SCAD或自适应套索时存在遗漏相关协变量的高风险，我们建议在实际数据应用中使用套索或弹性网络。但是考虑到对于真正稀疏模型的有前景的结果，如果有更好的预选择程序，我们看到了SCAD和自适应套索的一些优势。这需要进一步的方法学研究。

相似文献

High-dimensional Cox models: the choice of penalty as part of the model building process.

Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.

L1 penalized estimation in the Cox proportional hazards model.

Biom J. 2010 Feb;52(1):70-84. doi: 10.1002/bimj.200900028.

Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.

Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6.

Gradient lasso for Cox proportional hazards model.

Bioinformatics. 2009 Jul 15;25(14):1775-81. doi: 10.1093/bioinformatics/btp322. Epub 2009 May 15.

Variable selection for proportional odds model.

Stat Med. 2007 Sep 10;26(20):3771-81. doi: 10.1002/sim.2833.

Predicting survival from microarray data--a comparative study.

Bioinformatics. 2007 Aug 15;23(16):2080-7. doi: 10.1093/bioinformatics/btm305. Epub 2007 Jun 6.

Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO.

Biometrics. 2007 Mar;63(1):259-71. doi: 10.1111/j.1541-0420.2006.00660.x.

Partial Cox regression analysis for high-dimensional microarray gene expression data.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i208-15. doi: 10.1093/bioinformatics/bth900.

Cox survival analysis of microarray gene expression data using correlation principal component regression.

Stat Appl Genet Mol Biol. 2007;6:Article16. doi: 10.2202/1544-6115.1153. Epub 2007 May 29.

Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality.

Res Rep Health Eff Inst. 2009 May(140):5-114; discussion 115-36.

引用本文的文献

Predictive Models for Long Term Survival of AML Patients Treated with Venetoclax and Azacitidine or 7+3 Based on Post Treatment Events and Responses: Retrospective Cohort Study.

JMIR Cancer. 2024 Aug 21;10:e54740. doi: 10.2196/54740.

Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection.

J Biomed Inform. 2024 Jan;149:104581. doi: 10.1016/j.jbi.2023.104581. Epub 2023 Dec 23.

Comparison of models for stroke-free survival prediction in patients with CADASIL.

Sci Rep. 2023 Dec 17;13(1):22443. doi: 10.1038/s41598-023-49552-w.

Penalized variable selection in multi-parameter regression survival modeling.

Stat Methods Med Res. 2023 Dec;32(12):2455-2471. doi: 10.1177/09622802231203322. Epub 2023 Oct 12.

Target Genes of c-MYC and MYCN with Prognostic Power in Neuroblastoma Exhibit Different Expressions during Sympathoadrenal Development.

Cancers (Basel). 2023 Sep 16;15(18):4599. doi: 10.3390/cancers15184599.

An immune risk score predicts progression-free survival of melanoma patients in South China receiving anti-PD-1 inhibitor therapy-a retrospective cohort study examining 66 circulating immune cell subsets.

Front Immunol. 2022 Dec 7;13:1012673. doi: 10.3389/fimmu.2022.1012673. eCollection 2022.

Survival analysis of localized prostate cancer with deep learning.

Sci Rep. 2022 Oct 24;12(1):17821. doi: 10.1038/s41598-022-22118-y.

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening.

BMC Cancer. 2022 Oct 5;22(1):1045. doi: 10.1186/s12885-022-10117-1.

Development and Validation of Nomograms to Predict Overall Survival and Cancer-Specific Survival in Patients With Pancreatic Adenosquamous Carcinoma.

Front Oncol. 2022 Mar 7;12:831649. doi: 10.3389/fonc.2022.831649. eCollection 2022.

Identifying miRNA-mRNA Integration Set Associated With Survival Time.

Front Genet. 2021 Jun 29;12:634922. doi: 10.3389/fgene.2021.634922. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高维Cox模型：作为模型构建过程一部分的惩罚项选择

High-dimensional Cox models: the choice of penalty as part of the model building process.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献