Suppr超能文献

用于Cox比例风险模型的梯度套索法

Gradient lasso for Cox proportional hazards model.

作者信息

Sohn Insuk, Kim Jinseog, Jung Sin-Ho, Park Changyi

机构信息

Department of Biostatistics & Bioinformatics, Duke University, NC 27705, USA.

出版信息

Bioinformatics. 2009 Jul 15;25(14):1775-81. doi: 10.1093/bioinformatics/btp322. Epub 2009 May 15.

Abstract

MOTIVATION

There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Cox's proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data.

RESULTS

Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity.

AVAILABILITY

R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.

摘要

动机

人们越来越关注表达生存表型(例如癌症复发或死亡时间)或其在一组基因的表达数据子集方面的分布。然而,由于基因表达数据的高维度,在拟合预测模型(例如Cox比例风险模型)时存在严重的共线性问题。为了避免共线性问题,已经提出了几种基于惩罚Cox比例风险模型的方法。然而,由于模型拟合需要进行高维矩阵求逆,这些方法存在严重的计算问题,例如收敛缓慢甚至失败。我们建议通过梯度套索算法实现带套索惩罚的惩罚Cox回归,该算法比其他算法更快地收敛到全局最优解。此外,梯度套索算法在温和的正则条件下保证收敛到最优解。因此,我们的梯度套索算法可以成为开发基于包括基因表达数据在内的高维协变量的预测模型的有用工具。

结果

模拟研究结果表明,梯度套索预测模型能够恢复预后基因。弥漫性大B细胞淋巴瘤数据集和挪威/斯坦福乳腺癌数据集的结果也表明,我们的方法在计算时间、预测和选择性方面与Park和Hastie以及Goeman的现有流行方法相比具有很强的竞争力。

可用性

R包glcoxph可在http://datamining.dongguk.ac.kr/R/glcoxph获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验