Department of Biostatistics, School of Public Health, Medical College of Soochow University, University of Alabama at Birmingham, Suzhou, 215123, China.
Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China.
BMC Bioinformatics. 2019 Feb 27;20(1):94. doi: 10.1186/s12859-019-2656-1.
Group structures among genes encoded in functional relationships or biological pathways are valuable and unique features in large-scale molecular data for survival analysis. However, most of previous approaches for molecular data analysis ignore such group structures. It is desirable to develop powerful analytic methods for incorporating valuable pathway information for predicting disease survival outcomes and detecting associated genes.
We here propose a Bayesian hierarchical Cox survival model, called the group spike-and-slab lasso Cox (gsslasso Cox), for predicting disease survival outcomes and detecting associated genes by incorporating group structures of biological pathways. Our hierarchical model employs a novel prior on the coefficients of genes, i.e., the group spike-and-slab double-exponential distribution, to integrate group structures and to adaptively shrink the effects of genes. We have developed a fast and stable deterministic algorithm to fit the proposed models. We performed extensive simulation studies to assess the model fitting properties and the prognostic performance of the proposed method, and also applied our method to analyze three cancer data sets.
Both the theoretical and empirical studies show that the proposed method can induce weaker shrinkage on predictors in an active pathway, thereby incorporating the biological similarity of genes within a same pathway into the hierarchical modeling. Compared with several existing methods, the proposed method can more accurately estimate gene effects and can better predict survival outcomes. For the three cancer data sets, the results show that the proposed method generates more powerful models for survival prediction and detecting associated genes. The method has been implemented in a freely available R package BhGLM at https://github.com/nyiuab/BhGLM .
在功能关系或生物途径中编码的基因的群体结构是大规模分子数据中用于生存分析的有价值且独特的特征。然而,大多数以前的分子数据分析方法都忽略了这些群体结构。开发强大的分析方法来整合有价值的途径信息,以预测疾病生存结果和检测相关基因是很有必要的。
我们在这里提出了一种贝叶斯分层 Cox 生存模型,称为组 Spike-and-Slab lasso Cox(gsslasso Cox),通过整合生物途径的群体结构来预测疾病生存结果和检测相关基因。我们的分层模型采用了一种新的基因系数先验,即组 Spike-and-Slab 双指数分布,以整合群体结构并自适应地收缩基因的效应。我们开发了一种快速稳定的确定性算法来拟合所提出的模型。我们进行了广泛的模拟研究来评估模型拟合特性和所提出方法的预后性能,并将我们的方法应用于分析三个癌症数据集。
理论和实证研究均表明,所提出的方法可以在活跃途径中对预测因子产生较弱的收缩,从而将同一路径中基因的生物学相似性纳入分层建模中。与几种现有方法相比,该方法可以更准确地估计基因效应,并更好地预测生存结果。对于三个癌症数据集,结果表明,所提出的方法为生存预测和检测相关基因生成了更强大的模型。该方法已在 https://github.com/nyiuab/BhGLM 上的免费可用 R 包 BhGLM 中实现。