Wang Zhu, Wang C Y
Yale University, USA.
Stat Appl Genet Mol Biol. 2010;9(1):Article24. doi: 10.2202/1544-6115.1550. Epub 2010 Jun 8.
There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data. In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection simultaneously. We propose the Buckley-James boosting for the semiparametric accelerated failure time models with right censored survival data, which can be used to predict survival of future patients using the high-dimensional genomic data. In the spirit of adaptive LASSO, twin boosting is also incorporated to fit more sparse models. The proposed methods have a unified approach to fit linear models, non-linear effects models with possible interactions. The methods can perform variable selection and parameter estimation simultaneously. The proposed methods are evaluated by simulations and applied to a recent microarray gene expression data set for patients with diffuse large B-cell lymphoma under the current gold standard therapy.
通过研究基因表达微阵列数据来预测患者治疗后的生存率,这一兴趣与日俱增。在具有高维基因组数据的回归和分类模型中,提升算法已成功应用于构建准确的预测模型并同时进行变量选择。我们针对具有右删失生存数据的半参数加速失效时间模型提出了巴克利 - 詹姆斯提升算法,该算法可用于利用高维基因组数据预测未来患者的生存率。本着自适应LASSO的精神,还引入了孪生提升算法以拟合更稀疏的模型。所提出的方法采用统一的方法来拟合线性模型、具有可能相互作用的非线性效应模型。这些方法可以同时进行变量选择和参数估计。通过模拟对所提出的方法进行了评估,并将其应用于当前金标准治疗下弥漫性大B细胞淋巴瘤患者的最新微阵列基因表达数据集。