Liu Zhenqiu, Magder Laurence S, Hyslop Terry, Mao Li
Greenebaum Cancer Center, University of Maryland, 22 South Greene Street, Baltimore, MD 21201, USA.
Algorithms Mol Biol. 2010 Aug 16;5:30. doi: 10.1186/1748-7188-5-30.
It has been demonstrated that genes in a cell do not act independently. They interact with one another to complete certain biological processes or to implement certain molecular functions. How to incorporate biological pathways or functional groups into the model and identify survival associated gene pathways is still a challenging problem. In this paper, we propose a novel iterative gradient based method for survival analysis with group Lp penalized global AUC summary maximization. Unlike LASSO, Lp (p < 1) (with its special implementation entitled adaptive LASSO) is asymptotic unbiased and has oracle properties 1. We first extend Lp for individual gene identification to group Lp penalty for pathway selection, and then develop a novel iterative gradient algorithm for penalized global AUC summary maximization (IGGAUCS). This method incorporates the genetic pathways into global AUC summary maximization and identifies survival associated pathways instead of individual genes. The tuning parameters are determined using 10-fold cross validation with training data only. The prediction performance is evaluated using test data. We apply the proposed method to survival outcome analysis with gene expression profile and identify multiple pathways simultaneously. Experimental results with simulation and gene expression data demonstrate that the proposed procedures can be used for identifying important biological pathways that are related to survival phenotype and for building a parsimonious model for predicting the survival times.
已经证明,细胞中的基因并非独立起作用。它们相互作用以完成某些生物学过程或实现某些分子功能。如何将生物途径或功能组纳入模型并识别与生存相关的基因途径仍然是一个具有挑战性的问题。在本文中,我们提出了一种基于迭代梯度的新型生存分析方法,用于具有组Lp惩罚的全局AUC总结最大化。与LASSO不同,Lp(p < 1)(其特殊实现称为自适应LASSO)是渐近无偏的并且具有神谕性质1。我们首先将用于单个基因识别的Lp扩展为用于途径选择的组Lp惩罚,然后开发一种用于惩罚全局AUC总结最大化的新型迭代梯度算法(IGGAUCS)。该方法将遗传途径纳入全局AUC总结最大化,并识别与生存相关的途径而不是单个基因。仅使用训练数据通过10折交叉验证来确定调整参数。使用测试数据评估预测性能。我们将所提出的方法应用于基因表达谱的生存结果分析,并同时识别多个途径。模拟和基因表达数据的实验结果表明,所提出的程序可用于识别与生存表型相关的重要生物途径,并用于构建预测生存时间的简约模型。